Generative AI on a Microcontroller

Project Logs

Collapse

Building my own Inference Engine
Tim • 5 days ago • 2 comments

Nothing has happened for a while in this project. The reason is somewhat obvious from the previous entry: There are many solutions for edge inference, but none that is really fitting my purpose.

To be honest, it is not a problem that is easy to solve, because a lot of flexibility (aka complexity) is required to address all possible types of NN-models people come up with. In addition there is a tendency trying to hide all the complexity - and this adds even more overhead.

When it comes to really low end edge devices, it seems to be simpler to build your own inference solution and hardwire it to your NN-architecture.

... and that is what I did. You can find the project here:

https://github.com/cpldcpu/BitNetMCU

Detailed write up

https://github.com/cpldcpu/BitNetMCU/blob/main/docs/documentation.md

I learned a lot about squeezing every last bit out of the weights during training and make the inference as lean as possible. I used the well-known MNIST dataset and a small CH32V003 microcontroller as a test vehicle and achieved >99% test accuracy. This is not a world record, but beats most other MCU based applications I have seen, especially on a MCU with only 16kb flash and 2kb sram. (Less than an Arduino UNO).

So far, I got away with only implementing fc-layers, normalization and ReLU. But to address "GenAI" i will also have to implement other operators, eventually. We'll see...
Tiny Inference Engines for MCU deployment
Tim • 11/19/2023 at 15:36 • 0 comments

The big question is now how to implement our trained model on a microcontroller. Ideally that should be a solution that works with PyTorch (since I trained the models in it) and that minimized SRAM and Flash footprint also on very small devices (No point in having a 6k parameter model if your inference code is 30k).
I spent quite some time searching and reviewing various options. A short summary of my findings:
Read more »
Improving Quality (More Layers!)
Tim • 11/12/2023 at 11:52 • 0 comments

So far, I avoided introducing convolutional layers at the full image resolution of 32x32. This is because it would drive up the SRAM memory footprint signficantly. However, since no convolution takes place in 32x32, there are limitations to the image quality.

Depth first/tiled inference of the CNN may help to reduce memory footprint. So we should not immediately discard adding more layers.

Read more »
Minimizing the VAE memory footprint
Tim • 11/09/2023 at 23:18 • 0 comments
To implement the VAE on a microcontroller with small SRAM and Flash footprint it is necessary to minimize the size of the network weights themselves and also consider the SRAM footprint required for evaluation.
1. The size of the model in the flash is defined by the total number of parameters. I assume that it will be possible to quantize the model to 8 bit, so the model must be reduced to a few thousand of parameters to fit into the flash of a small MCU.
2. The SRAM consumption is defined by the parameters that need to be stored temparily during inference. Assuming that we evaluate the net layer by layer, then we should limit the maximum memory footprint of one layer. There may also be some optimization posssible by evaluating the CNN part of the decoder tile by tile. (depth first)
Read more »
Conditional Variational Autoencoder (CVAE)
Tim • 11/09/2023 at 20:27 • 0 comments
After dabbling a bit with both diffusion models and VAEs, I decided to focus on CVAEs first, instead. As it seems the main problem is not the training of the network, but finding a smooth way to implement it on a MCU. So I'd rather deal with a simple architecture first to tackle the MCU implementation.

VAEs were originally introduced in 2013 in this paper. There is a very good explanation of VAEs here.

A VAE consists of an encoder and a decoder part. The encoder is a multilayer artificial neural network (usually a CNN) that reduces the input data to a latent representation with fewer parameters. The decoder does the opposite and expands the latent representation to a high resolution picture. The network is trained to exactly reproduce the input image on the output. In addition, there is a clever trick (the "reparamerization trick") that ensures that the latent representation is encoded in a way, where similar images are grouped. After the network is trained, we can use only the decoder part and feed in random numbers to generate new images.

Since we also want to control the number of pips on the die, we also need to label the data that is fed in - that is where the conditional part in the CVAE comes from.

The Model:Encoder
```
        self.encoder = nn.Sequential(
            nn.Conv2d(1 + num_classes, dim1, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(dim1, dim2, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(dim2, dim3, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(dim3, dim3, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
        )

        self.fc_mu = nn.Linear(dim3*4*4 + num_classes, VAE_choke_dim)
        self.fc_var = nn.Linear(dim3*4*4 + num_classes, VAE_choke_dim)
```
Read more »
Options for Generative AI Models
Tim • 11/05/2023 at 21:28 • 0 comments
Since our goal is to generate images, we need to select a suitable artifical neural network architecture that is able to generate images based on specific input.

Typically, three architectures are discussed in this context as of today (2023):
Diffusion models are the newest of the bunch and are at the core of the AI image generators that are creating a lot of hype currently. Latent diffusion, the architecture at the core of Stable Diffusion, is actually a combintion of a diffusion model and a VAE.

Obviously, a diffusion model may be the most interesting to implement. Therefore I will start with that. There may be a risk that it turns out too heavyweight for a microcontroller though, even when the problem is as simplified as we made it already.

Variational Autoencoders may be a good alternative for a simpler architecture with a higher probability of it being able to fit to be deployed on a small device. Therefore this is second priority, at least as a backup.

Generative Adversarial Networks were the most lauded approach before diffusion models stole the show. Since they basically train a decoder that could be used in a very similar way as a VAE, they may also be an interesting option to create a lightweight model. Compared to VAEs, they may be better suited to create novel images. But that is something to find out. Unfortunatley, it appears that training GANs is less easy than the other two options. Therefore I will park this for now, maybe to be revisited later.

Generally, it has to be assumed that the problem of generating images requires more processing power and larger neural networks than a model that only does image recognition (a discriminator). There are plenty of examples of running MNIST infererence on an Arduino. Does it work for generative NN as well? That remains to be seen...

Next Steps

1) Investigate diffusion models

2) look into variational autoencoders
Training Dataset Generation and Evaluation Model
Tim • 11/05/2023 at 10:08 • 0 comments

Training dataset

Since the capabilities of the target platform are somewhat limited, I elected to create a simplified synthetic dataset for training. I chose 1x32x32 greyscale as target resolution for the images, as this fits into 1kb footprint. The resolution can be increased later, and we can obviously also use a fancier looking die image at a later time.

I need labeled images showing die-rolls with 1-6 pips. There should also be some variation, because otherwise using generative AI is quite pointless.

I used GPT4 to generate a Python program to generate images of dice and later refined it iteratively with copilot in vscode. While all the attention is on GPT, copilot chat got impressively useful in the mean time. It's much easier to interact with specific parts of your code, while this is a hazzle in GPT4.

Images are created in 128x128 and then downscaled to 32x32 to introduce some antiailiasing. The die images are rotated by an arbitraty angle to introduce variation. It should be noted that rotation of the dies requires them to be scaled down, so they are not clipped. This will introduce also a variation in scaling to the dataset.

Example outputs are shown below.

Read more »
Goal and Plan
Tim • 11/04/2023 at 23:28 • 0 comments
Goal

The goal of this project is to build an electronic die using generative AI on a microcontroller. And of course it is a nice opportunity for me to play a bit with ML.

Pushing a button shall initiate the roll of a dice and a random result is shown on a display. Instead of using 7 LEDs and logic circuit, as in a traditional circuit project, we shall use a small display (e.g. SSD1306) and a microntroller (TBD - not sure how low we can go.).
- The display shall show a picture of the die as rolled.
- The number of pips should be clearly indicated.
- The graphics shall be generated in real time by a generative AI algorithm.
- Everything should be light-weight enough to run on a MCU.
Read more »

View all 8 project logs

Generative AI on a Microcontroller

Details

Table of contents

View all logs in order

Project Logs

Collapse

Building my own Inference Engine

Tiny Inference Engines for MCU deployment

Improving Quality (More Layers!)

Minimizing the VAE memory footprint

Conditional Variational Autoencoder (CVAE)

The Model:Encoder

Options for Generative AI Models

Next Steps

Training Dataset Generation and Evaluation Model

Training dataset

Goal and Plan

Goal

Discussions

Similar Projects

Recognizing MNIST-based Handwritten Digits

Doom Air

Use Artificial Intelligence to Detect Messy Rooms!

Photogrammetry and Image Acquisition

Generative AI on a Microcontroller

Become a Hackaday.io member

Just one more thing

Details

Table of contents

Project Logs Collapse

The Model:Encoder

Next Steps

Training dataset

Goal

Enjoy this project?

Discussions

Become a Hackaday.io Member

Similar Projects

Does this project spark your interest?

Report project as inappropriate

Send message

Remove Member

Project Logs

Collapse