Motivation

In order to support the development of gesture controlled musical instruments like for example lalelu_drums, I want to run a human pose estimation AI model with a high frame rate (>=75 frames per second (fps)). I found the Google Coral AI accelerator an interesting device, since it is compatible with single board computers like Raspberry Pi and there are ready-to-use pose detection AI models available for it.

The code (python only) supporting this project is hosted on GitHub.

Choosing the network

I did some tests with the PoseNet model, that is available in various precompiled versions for the Coral, but I found that it has limited performance for my use case, especially when it comes to moving the legs. When I raised and lowered my legs like in an exaggerated walking motion, the PoseNet results for the leg keypoints often switched to wrong coordinates in the image. Therefor, I was happy that Google released the Movenet pose estimation model and also provided precompiled versions for the Coral. Movenet is trained with image data specific for sports and dancing, so I think it is close to ideal for my purpose.

Movenet comes in two sizes: the smaller, faster and less precise 'lightning' and the larger, slower but more precise 'thunder'.

Speed benchmarking

The Google Coral AI accelerator advertises itself with very low inferences time on its webpage. However, when I tried to reproduce these values with my own benchmark, I got much longer inferene times.

movenet lightningmovenet thunder
from coral.ai
7.1 ms
13.8ms
measured (Raspberry Pi 4 + Coral)
20.9 ms
39.4 ms
measured (Raspberry Pi 4 overclocked + Coral)
14.5 ms
27 ms

The explanation for this behaviour is, that only a fraction of the calculations necessary for the movenet inference is actually performed on the Coral, while the significant rest is done on the CPU, since the Coral does not support all the necessary operations. Therefor, inference speed depends heavily on the CPU.

So, I overclocked the Raspberry Pi with the following settings in config.txt

over_voltage=6
arm_freq=2000
force_turbo=1

As shown in the table, the inference times could be reduced by ~30%, but still they do not allow frame rates >= 75fps. My solution to overcome this limitation is shown below, but first I would like to discuss the selection of the camera.

Choosing the camera

I started off using the Raspberry Pi camera V1, which was working fine. However, I later switched to the PS3 camera (Play Station Eye) for the following reason.

The PS3 camera is connected via cable that can be several meters long (I could successfully use a USB extension cable). This is important for my application, since if you are playing gesture controlled musical instruments in front of an audience, the camera will be between you and the audience and therefor should be as unobstrusive as possible. In case of the Raspberry Pi camera, it would not only be the camera, but also the Raspberry Pi and the Google Coral USB accelerator and there would not only be a single USB cable, but at least one power cable plus audio output cable.

So, I recommend the PS3 camera, also for the following other advantages:

  • Available second-hand for a few euros
  • Integrated to the linux kernel for a long time, so should work with any linux distribution
  • Supports high frame rates (up to 187fps)
  • Large pixels, meaning that if you want low resolution images anyway, you need no preprocessing (binning) and you collect a lot of light.

(Obviously, the PS3 camera was designed for applications very similar to mine)

The biggest disadvantage I see is that the available frame rates have quite a coarse spacing. In the regime I am interested in, there is 50, 60, 75 and 100fps.

Unfortunately, the PS3 camera is not manufactured anymore. So if you know a good alternative with similar features, please let me know.

Time interleaved strategy

My solution to achieve 75fps with a movenet inference time of ~15ms is to use two Google Coral...

Read more »