Close
0%
0%

EyeBREAK (Morse Blink BLE Keyboard)

EyeBREAK: EyeBlink Realtime ESP32 Assistive Keyboard

mbwMBW
Similar projects worth following
This project aims to create a portable hands-free assistive keyboard for those who are unable to move most parts of their body. People who have suffered strokes or even locked-in syndrome often retain control over their eyelid movements, so using eyelids as an input device would be very useful. Morse code is an effective and easy to learn system for converting binary states into characters; also, blinking words in Morse was made famous by Jeremiah Denton during his televised interview while captive in Vietnam.

We choose to use an ESP32 over other devices (e.g. Raspberry Pi or desktop/laptop computer). An ESP32 is very portable, and its Bluetooth capabilities allow it to be used as a versatile input device for many host devices, including phones, compared to being constrained to a single device. It also provides a good challenge in creating an extremely lightweight machine learning system; there is a full object detection and classifier pipeline running real-time (20 fps).

Demo

Status

TODO:

  • Refinement (robustness, edge cases, UX improvements)
  • Try eye tracking for mouse?

Pipeline

The basic pipeline is shown in the diagram below. A small quantized CNN (36x36 input, 4 layers, 10k parameters) processes the downscaled camera input directly, producing a binary classification (eye closed or open). Using this and previous eyelid states and timestamps, the Morse code is decoded into characters, which are sent over BLE to the host device.

Previous Iterations

The first attempt used Dlib's facial landmark model based on “One Millisecond Face Alignment with an Ensemble of Regression Trees” after running face detection on the frame; the resulting eye coordinates would then be used to calculate the eye aspect ratio and judged to be open or closed depending on a threshold value. However, this was found to be quite inaccurate, and requires the camera to be further away from the face.

The next iteration used a Haar cascade detector which provided the bounding box for the image segment given to an XGBoost classifier. Although this worked rather well, especially since XGBoost isn't meant to be used for raw images, and is very fast (~4ms), it was rather fragile when the eye is not completely centered, hence the need to preprocess with the bounding box.

Firmware

The firmware is written in ESP-IDF to extract the most performance out of the device. OpenCV is used for resizing and text rendering, while the classifier uses TFLite for Microcontrollers, running a quantized int8 model, taking 20ms to run.

The Morse code lookup is currently done through a binary encoding array.

The BLE keyboard is implemented through the standard BLE HID over GATT protocol using the nimBLE stack.

For debugging, the ESP32 creates a video stream over WiFi with annotations, as shown in the video at the top.

OpenCV on ESP32

Compiling OpenCV for ESP32 is a bit of a pain. Following https://github.com/joachimBurket/esp32-opencv on the latest HEAD will get you 90% of the way there, while there is another issue due to type sizes (int32_t is long); this pull request takes cares of it. See the CMake file and the patch file in the repo (under components).

Dlib was used for preliminary exploration; it is quite simple to compile in comparison.

Previous Iterations

Dlib was compiled for the preliminary attempt, requiring just a couple of flags to build. A stripped down eye localization model took ~60ms to run on a 320x240 image.

The Haar cascade file was loaded from an XML which is stored in flash using a CMake function in ESP-IDF. It took up to 400ms to run, so it was run only occasionally.

XGBoost was deployed using generated C from an XGBoost classifier using m2cgen, and took 4ms.

ML

The MRL dataset is used for training. The images are resized to 36x36, and some augmentation (brightness, contrast, shifting, scaling, rotating) is done. The CNN is trained in Pytorch, then exported to ONNX, Tensorflow, and Tensorflow Lite, and finally quantized to 8-bit integers.

Physical Device

Currently, the device consists of a repurposed face shield frame, which are basically just lensless glasses, a 3D printed holder, and the ESP-EYE itself. The camera is positioned as such because it matches the images in the dataset; if there is another freely available blink dataset with a side view, then a less intrusive design could be built.

arm.stl

A very quick and dirty arm to hold the board. It is supposed to clip to the eyeglass frame but it doesn't work.

Standard Tesselated Geometry - 15.80 kB - 05/30/2023 at 07:04

Download

  • 1 × ESP-EYE
  • 1 × "Glasses" Frame Used frame from a face shield
  • 1 × 3D printed arm quick and dirty thing whipped up in 15 minutes

  • Source code + updates

    MBW05/30/2023 at 07:02 0 comments

    I have published the code at https://github.com/m-bw/EyeBREAK; it includes the training notebook as well as the firmware.

    I was testing if a 36x36 image input would work, and it in fact does achieve the same or better accuracy (97%) with a third of the parameters (10k) and also halves the inference time (20ms). This brings the FPS back up to around 28.

    I also moved to esp-nimble-cpp for the BLE stack and it does indeed reduce flash usage by about .2MB.

  • Working Keyboard Functionality

    MBW05/22/2023 at 06:20 0 comments

    I finally got BLE keyboard functionality working. Although there is an example provided with ESP-IDF, getting the HTTP server (for the camera stream), WiFi, and Bluetooth stacks running at the same time is a bit finicky.

    The main issue that arose was IRAM size limitations. I tried the config options found in this IDF doc section, but evidently there still wasn't enough room. There is a new config option found in the ESP-IDF master branch that allows some DRAM sections to be used for IRAM, so I switched to master instead and enabled that option. Then, DRAM wasn't big enough, so I had to force Bluetooth and WiFi to allocate from PSRAM instead, among other things, which finally made it work.

    The BLE HID example uses Bluedroid, which is larger than the alternative NimBLE stack. Perhaps switching to this could reduce these issues? I've actually used NimBLE with Mynewt many years ago for another BLE HID project, so it might not be too difficult.

    As for the functionality itself, the keyboard supports typing characters as well as backspace, by holding the eye closed for a longer amount of time. Also, to provide visual feedback, the individual Morse dots/dashes are typed out while a character is in progress, then erased once it is finalized and replaced by the final character. I'll try to show this in a video recording.

  • Working Fast CNN with TFLite

    MBW05/18/2023 at 09:05 0 comments

    I trained a very small CNN (30k parameters, 4 layers) on the dataset and achieved higher accuracy (96%). It also works better in the wild, even without the Haar cascade preprocessing.

    Through TFLite for Microcontrollers, I was able to get it to run on-device. With 8-bit quantization, it runs at ~18 fps, which is pretty good.

    The whole train-deploy pipeline is a bit of a mess though, since I used Pytorch:

    Pytorch (Lightning) -> ONNX -> TF -> TFLite -> TFLite (Quantized).

    Will add another demo soon.


View all 3 project logs

Enjoy this project?

Share

Discussions

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates