    kasik5 days ago 0 comments

    Today a few notes on useful metrics. In one of the previous logs I mentioned looking at accuracy and loss, but of course the topic is more complex. Sometimes accuracy may even be misleading. 

    Let's imagine that we have a naive classifier - a classification algorithm with no logic or rather with minimum effort and data manipulation to prepare a forecast. We want to model anomaly detection, with 0 for no anomaly and 1 for anomaly detected. We can use for example a method called Uniformly Random Guess: that predicts 0 or 1 with equal probability. Or even better - an algorithm that always outputs 0. With a low anomaly rate, we can get the accuracy of our algorithm even 100% and have an impression that we did an excellent job.

    A nice tool to consider is a confusion matrix - a table where we have predicted labels and actual labels with scores for each, for example:

    It is very easy to create and display the confusion matrix in python using sklearn.metrics:

        # Compute confusion matrix
        cm = confusion_matrix(y_val, y_pred)
                    xticklabels=['Person', 'Not person'],
                    yticklabels=['Person', 'Not person'])
        plt.ylabel('Prediction', fontsize=13)
        plt.xlabel('Actual', fontsize=13)
        plt.title('Confusion Matrix', fontsize=17)
    Confusion matrix

    Useful terms related with confusion matrix:
        True Positive (TP): Values that are actually positive and predicted positive.
        False Positive (FP): Values that are actually negative but predicted positive.
        False Negative (FN): Values that are actually positive but predicted negative.
        True Negative (TN): Values that are actually negative and predicted negative.

    With the use of the above, we can calculate the following:
    Out of all the positive predicted, what percentage is truly positive.

    Precision = True Positive/Predicted Positive

    Out of the actual positive, what percentage are predicted properly.  
    Recall = True Positive/Actual Positive

    It is very common to calculate F1 score, which is used to measure test accuracy. It is a weighted average of the precision and recall. When F1 score is 1 it’s best and on 0 it’s worst.

        F1 = 2 * (precision * recall) / (precision + recall)

    Note, that the above are calculated for the target class and not for the full system.

    And as always, we need to have our particular problem in mind, in some cases the occurrences of False Negatives have more severe consequences then False Positives (for example cancer detection). Which means we may need to consider some weights in the calculation of metrics or maybe even choose different ones.

  • Overfitting

    kasik6 days ago 0 comments

    It seems I haven't been here for a while, fear not - the project continues! Due to some other projects, I was forced to upgrade my Linux distribution, thus most of the packages got updated as well. I must admit, It took me an hour or two to fix the project due to changes in the libs. Now I am using Keras version 3.2.1.

    I wanted to come back to the topic of overfitting, which can cause many troubles when working on a model. Very often, while observing the training progress - we may see a beautiful curve of training and  accuracy that reaches 1 very quickly and stays there. At the same time - the training loss gets very close to 0. Such graphs may lead to a false impression that we have created an excellent neural network. 

    Unfortunately that is too good to be true. If we look at the validation graphs instead, it doesn't seem so great anymore - the accuracy and loss lag way behind and achieve worse performance.
    What actually happens is that the neural network becomes too specialized, learns the training set too well, that it fails to generalize on the unseen data.
    And that is overfitting in a nutshell. There are various ways to fight that.

    1. Use more data!
    Sounds easy enough yet not always feasible. It shouldn't come as a surprise that there are ways to overcome that as well, such as data augmentation (log post coming soon) or even transfer learning, where a pretrained model is used (log post coming soon).
    As a note - random shuffling the data order has a positive effect as well.

    2. Model ensembles
    Sounds strange, doesn't it? Model ensembles means using several neural networks with different architectures. However, it is quite complex and computationally expensive.

    3. Regularization
    I used this term already a few times - this is a set of techniques that penalizes the complexity of the model. Its goal is to add some stochasticity during training. It can mean for example adding some extra terms during the loss calculation or early stopping - which means stopping learning process when the validation loss doesn't decrease anymore.
    A technique that I find interesting is a dropout - where the activations from certain neurons are set to 0. Interesting enough - this can be applied to neurons in fully connected layers or convolution layers - where a full feature map is dropped. This means that temporarily new architectures are created out of the parent network. The nodes are dropped by a dropout probability of p (for the hidden layers, the greater the drop probability the more sparse the model, where 0.5 is the most optimised probability). Note that this differs with every forward pass - meaning with each forward pass it is calculated which nodes to drop. With dropout we force the neural network to learn more robust features and not to rely on specific clues.  As an extra note - dropout is applied only during training and not during predictions.

    On the graphs below you can se the effect of adding a dropout: - the the validation loss follows the curve of training loss; the training accuracy doesn't reach immediately 1 and validation accuracy follows the training one.

    In my project I use dropout on fully connected layers, I will further increase the dataset, use data augmentation, try out transfer learning.

  • Training

    kasik04/01/2024 at 11:32 0 comments

    It is finally time to train the model that I described in the previous log - the script is available in my repository and is called person_detection.py
    First I need to load the data and make sure the images are 96x96 pixels grayscale. We will need to use int8 for the input on the microcontroller (uint8 is not accepted), hence I also convert the images to that range. I created a module dataset.py for the purpose of loading data as I will be reading images also in another script. I always like to check my dataset so, I choose a picture for display:

    plt.imshow(images[1], cmap='gray', vmin= -128, vmax=127)

    Then I split the data to use 60% as training set, 20% as validation set and 20% as test set . Note that train_test_split method can shuffle the data.

        #split images to train, validation and test
        X_train, x_test, Y_train, y_test = train_test_split(np.array(images), np.array(labels), test_size= 0.2)
        x_train, x_val, y_train, y_val = train_test_split(X_train, Y_train, test_size = 0.25

    Validation set is used to check the accuracy of our model during the training. Next step is to create the model (as per last log) and finally - time to train:

      # Fit model on training data
        history = model.fit(x_train, y_train, epochs=EPOCHS, validation_data=(x_val, y_val))

    Here we need to choose batch size end epochs. Batch size specifies how many pieces of training data to feed into the network before measuring its accuracy and updating its weights and biases. Big batch size leads to less accurate models. However, it seems that the models trained with large batch sizes tend to become dataset specialized thus they are more likely to overfit. Too small batch size on the other hand results in a very long computation time. Small batch size means that we will need to calculate the parameters more frequently - hence increased training time.
    Regarding epochs - this parameter specifies the number of times the network will be retrained. The intuition would be - the more the better - however, this would affect not only the computation time, but also it turns out that some networks may start to overfit.

    And voila! We can observe the training progress.

    When the training is done, I want to observe the basic metrics:

        # Extract accuracy and loss values (in list form) from the history
        acc = history.history['accuracy']
        val_acc = history.history['val_accuracy']
        loss = history.history['loss']
        val_loss = history.history['val_loss']
        # Create a list of epoch numbers
        epochs = range(1, len(acc) + 1)
        # Plot training and validation loss values over time
        plt.plot(epochs, loss, color='blue', marker='.', label='Training loss')
        plt.plot(epochs, val_loss, color='orange', marker='.', label='Validation loss')
        plt.title('Training and validation loss')
        # Plot training and validation accuracies over time
        plt.plot(epochs, acc, color='blue', marker='.', label='Training acc')
        plt.plot(epochs, val_acc, color='orange', marker='.', label='Validation acc')
        plt.title('Training and validation accuracy')

    Obviously we would like that validation accuracy follows closely accuracy on the training set, similarly the validation loss shall follow loss on the training set. Of course the validation set will perform worse, but we don't want them to fall too far apart.

    After that, let's try out our training set.

        # Evaluate neural network performance
        score = model.evaluate(x_test,  y_test, verbose=2)

    Just like that I have my first model trained - however, now it is time to play with basic hyperparameters and try to achieve better results.

    Happy playing!

  • Building a model

    kasik03/29/2024 at 18:59 0 comments

    Now when I have my dataset prepared, it's time to create a model. Many papers, tutorials, videos, blog posts have been written about neural networks for image recognition. For me this topic is fascinating and very broad. I am just going to summarize here the key points - what steps I took and what I learned. 

    Preparing and training a model is quite a complex task - model architecture, hyperparameters (parameters that are not learned by the model, but are configured by us) setting, minimizing overfitting. Despite many guidelines, rules of thumb, I still see that tuning all of the parameters properly is kind of an art.

    I am going to use Python and TensorFlow to build and train the model - the script is available in my repository and is called person_detection.py

    I will first focus on the network architecture itself.
    The most popular neural network architecture for image recognition is a convolutional neural network (CNN). It is based on performing a convolution of the image pixels with a set of filters. This technique helps preserve spatial structure and helps the network extract the features.

    While creating the CNN architecture, it is worth to keep in mind that we are working on a model for a microcontroller, so we need to keep the number of trainable parameters low - to ensure that we can still fit the model in our limited memory.

    In the beginning, I use a series of convolution layers, activation and pooling layers - in order to extract as many features as possible. There are various parameters worth tweaking - the kernel size (size of the filter matrix), number of filters, padding, stride (step). The output of this layer is called an activation map. We need to keep in mind that the more filters we use, the more features are extracted, but at the same time the size and number of activation maps increases - which leads to the increase of trainable parameters. And as already mentioned, we need to be careful about that.
    Each convolution layer is then passed via the activation function - ReLu - to introduce some non- linearity (interesting lecture from Stanford describing activation functions)
    Next, there is a pooling layer - to downsample and minimize the size of the activation maps. Decision about the number of pooling layers to use is often a trial and error based - we want to make the input smaller, but at the same time we need to make sure we don't lose any important information.

    Next, I need to convert the data from 2D array to 1D array (flatten) as the input to the dense (fully connected) layer. In convolutional layers each filter focuses on the same spatial location and extracts the important information, while in dense layer - each of the neurons looks at the same full image. The number of neurons is the parameter to play with here..

    After that I use a dropout layer - which is considered a type of regularization (how to penalize model complexity) - and is one of the ways to reduce the overfitting. This happens when the network becomes to specialized for the training set and is not able to properly generalize for the unseen data. Here you can specify the rate - fraction of the input units to drop. The last one is dense layer with softmax - the output of which gives me the probability score for each of the labels, in my case - whether a person is detected or not.

    I build the model using:

    model = tf.keras.models.Sequential

    and as an input I use a list of the layers, for example:

            # Convolutional layer.
                8, (3, 3), activation="relu",  input_shape=(IMG_WIDTH, IMG_HEIGHT, 1)
            # Max-pooling layer, using 2x2 pool size
            tf.keras.layers.MaxPooling2D(pool_size=(2, 2)), 

    When I have all this ready I need to create the model using


    where I can specify a number of parameters. I think the most important are:

    • optimizer - how to minimize the loss - algorithm that calculates how the parameters of the network shall be changed. I chose 'Adam' - one of the most popular and...
  • Gathering dataset

    kasik03/19/2024 at 09:22 0 comments

    I believe gathering a proper dataset is the key in any machine learning project - it may seem trivial, but this task shall not be underestimated. The model's accuracy depends on the data that is used. The algorithms need thousands of examples to learn the patterns and to be able to make predictions. Additionally, the dataset needs to have the proper representation of of each class.

    Regardless whether you decide to go through a full chain of machine learning experience or want to use platforms like Edge Impulse or Neuton - you still need to prepare your dataset.
    Nowadays there is plenty of available free datasets one can use for ML algorithms. However, for the project like this one (on embedded system), it is best to build the dataset using the same sensor as the one planned to be used for the inference after deployment. It is possible though to make use of transfer learning (thus a network already pretrained on another dataset), however I will leave this topic for later.

    As I am working on the binary classification algorithm - I need to gather 2 subsets of images - one containing a person (I started with various pictures of me and my partner) and another without a person - all sorts of the background in our house and additionally some images of our dog. I used test_camera.ino sketch (from my repository) that sends an image over a serial port after pressing a button. Additionally I created a python script image_viewer.py that receives data from serial port, displays it and asks whether to save it. I found that option very handy as it turns out it is not so easy to aim with that camera! Also it seems my hand is not that stable as I thought.

    I'll be using 96x96 grayscale images for inference thus I decided to transmit such images straight away from the microcontroller. Another option is to send the pixels retrieved from the camera and then pre-process it on the PC just before training - however, I wanted to be sure of my processing algorithms on the arduino and keep the images consistent - both for training as well as for inference.

    I collected 60 images including a person and 60 images not depicting a person. Additionally I placed them in the folders marked 1 and 0 respectively. I know that this is a small dataset (1000 images per class is considered a decent dataset), however should be sufficient for first trials.

  • Retrieving images from OV7675 - not that easy as I initially thought!

    kasik03/17/2024 at 19:06 0 comments

    I decided to describe here my struggles with retrieving and displaying images from OV7675 camera as I stumbled across some unexpected events, but that's the part of the fun, isn't it?

    I wanted to understand well what is retrieved from a camera, play a bit with basic preprocessing (having in mind future inference) and be able to stream the image live on the PC.

    Having that in mind I prepared an arduino sketch test_camera.ino and a Python script camera_cont_display.py. Both available in my repository.
    I looked through the available examples that are provided with the libraries (mentioned in a previous log) as well as workshop provided by Edge Impulse.

    OV7675 can return images in RGB565 or grayscale format - I played with both and python script can handle both as well. However, I decided to use grayscale to reduce the size. And...that's when the weird things started to happen. The camera libs say that the camera returns 2 bytes per pixel in grayscale (nothing found in the datasheet though). However, the image I received was either double or I had some kind of "halo" effect. To my surprise the lib already filters that, while the method Camera.bytesPerPixel() still returns 2. Well - as my university professor told me - never trust any specification!

    I went on to implement scaling and cropping methods (to be found here: ), inspired by the Edge Impulse example. I was a bit surprised to see that there is a number of image arrays allocated dynamically, but I thought to give it a try. Only 2 arrays need to be allocated at the same time for each operation - this way we reduce the size of memory used in comparison to static allocation. However, it took only a few seconds before arduino started to have issues with memory allocation, I presume due to memory fragmentation. Static allocation then! After creating 3 arrays (for image retrieved from camera, for the scaled one, for the cropped one) for QVGA resolution I reached as much as 98% of dynamic memory usage. This managed to put my arduino to some unexpected state, where the program didn't work and the PC didn't recognize the device anymore.

    TIP: double-click of the reset button brought it back to life and I could flash the arduino once more. I searched through documentation and various fora and I failed to find any mention of this. Anyone could point me to some explanation?

    Finally, after some trial and errors I decided to use QCIF (176x144), scale it to 117x96 and then crop to 96x96. It takes 35% of dynamic memory (perhaps I shall consider in-place operations) and takes 8ms to process

  • Hardware and libs

    kasik03/11/2024 at 10:00 0 comments

    For my first trials I decided to use the TinyML kit: which contains Arduino Nano 33 BLE Sense and the OV7675 camera (kit).

    I use the following libraries on arduino IDE:

    • Arduino Mbed OS Nano Boards v3.0.1
    • Arduino_OV767X v0.0.2
    • Arduino_TensorFlowLite (from repository) v latest in the repository

    There is another library available as well: Harvard_TinyMLx (pepared for TinyML course available on EDX) - which contains the library for the camera and tensorflow. However, I wanted to use to official ones as this would enable me easier comparison of the results with Edge Impulse and help me avoid unexpected issues when searching through available examples.

    In the Arduino_OV767X library there is an example available to retrieve the image - which is very useful to make sure the hardware works.
    The Arduino_TensorFlowLite library also comes with several interesting examples, even person recognition - however I am interested to do a full chain - hopefully with the same (or perhaps even better..?) results.

  • Simple plan and next steps

    kasik03/11/2024 at 09:53 0 comments

    I will soon reveal what the purpose of the end product is supposed to be, however, for now the journey is the goal. 

    I've become fascinated by machine learning and then about the idea of using it on microcontrollers only about 6 months ago, so it is a whole new world to explore! And what's better way to learn then working on a project?

    In the following logs I want to cover the topics:

    • hardware description for first trials
    • gathering data from the camera
    • preparing a dataset
    • basic model and training
    • model preparation for microcontrollers
    • inference on arduino
    • trying out Edge Impulse

    I am sure more will come on the way!

    Disclaimer: Please note, that I am not the expert (yet..?) on tinyML, thus it's to be treated with a grain of salt!