Close

Final concept

A project log for Discrete component object recognition

Inspired by a recent paper, the goal is to develop a system that can recognize numbers from the MNIST database without a microcontroller

ciplionejciplionej 04/06/2020 at 01:070 Comments

The goal of this task was to validate the concept using discrete components, the reduced decision tree obtained in the Minimum hardware task and measure its accuracy.

The model

The model to be used was the one below, discussed and analyzed in a previous task.


The hardware

Due to stock and time limitations, only one LM393P voltage comparator was available, hence two splits could be implemented. The prototype was only going to be able to detect "Number 1" or "Numbers 4 or 7" by analyzing the signal from pixels 6 and 3. Nevertheless, only one of the LDRs used throughout this project was actually able to measure with acceptable drift and jitter so only digit "1" from the MNIST database was detected.

Since we had a single digit to identify, a single LED was used to inform whether the digit being shown had been identified as a "1". In order to build the confusion matrix, the LED voltage was recorded by the Arduino UNO.

The video below shows the setup in action.

The confusion matrix is attached below.

        [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]     18   11    4    2    3    1    2    0    1     0
 [Not 1,]  9   11   16   22   15   19   16   15   22     0

The accuracy towards number 1 was 35.2%. This was below the performance obtained with the micro-controller which is disappointing. I really wanted to get a better value using only discrete components.

Below is a close-up of the setup including the lonely LM393 in the center of the control breadboard and the single LDR on the "camera" breadboard. Since the screen brightness did not match the activation level for the voltage comparator, two 1k.ohm resistors were used in parallel with the LDR to align the sensitivity of the sensor to the screen intensity.

The final fine-tuning was carried out by displaying the target intensity on the screen and adjusting the screen brightness until the LED would be triggered around the level defined by the decision tree.

Speed test

In order to see how fast the system could detect the numbers, the numbers were drawn at increasing speed until the system could not catch up.

The test started at 1 Hz, was then increase to 10 Hz, then 20 Hz and 50 Hz.

The surprising result was that the system could keep up, but the screen could not.

In the video below, we can see the system recognizing number 1 at 20 Hz.


Nevertheless, the system could not be run any faster than this speed. The reason for this is that the screen could not draw the digits at the 50 Hz target speed.

Future work

Stock MNIST performance

The project was carried out using an averaged version of the MNIST dataset wherein the matrix was reduced to a 4x4 matrix from the original 28x28. The question that remains is, what would be the accuracy of the system when using the original database.

Implementing a more robust decision tree

This project proved that it's possible to implement a simple object recognition system using decision trees and discrete components. Nevertheless, a single split hardly represents a decision tree. A more complex decision tree would have been nice to test, given more time and the availability of parts.

Using the right tools

LDRs are definitely not to be used for these kind of projects. Maybe you can make them work, but the ones I had showed to be pretty unreliable.

A more robust system could probably be built with better sensors given the time. As Starhawk suggested, maybe photodiodes could be used instead with better results.

More voltage comparators for a larger tree would mean also that we'd need plenty of individual discrete voltage values. In order to have those delivered to the right comparators, a lot of voltage dividers could be used, but it could get complex pretty quickly. I wonder whether there are better tools.

Measuring the speed

Well, speed was one of the main drivers of this system and checking how fast it could go one of the objectives. Nevertheless, the screen refresh rate was the limiting factor in speed measurement.

From my point of view, the speed would be limited by the speed of the LDRs, with a response time in the order of 10 to 100 ms. Photodiodes on the other hand have response times starting at 20 ns, another reason to switch.

Either way, having a screen show the images faster than 100 Hz would be quite difficult. Any ideas of how to test this, if it does make any sense at all to test it, would be welcome.

Conclusion

We eventually managed to have a discrete component object recognition system.

It wasn't fancy.

It couldn't recognize a lot of numbers.

It couldn't even recognize the only number it was programmed to well.

All in all, it didn't fare great, the accuracy was rather poor.

Considering the goals of the project:

We can agree to a partial success.

All in all, it was quite a learning experience and extremely satisfying to see a simple LDR aided by a few resistors and a voltage comparator do the work we normally assume microcontrollers can do.

Discussions