Close

Efficientdet-lite0 on a raspberry pi 5

A project log for Raspberry pi tracking cam

Tracking animals on lower speed boards to replace jetson & coral

lion-mclionheadlion mclionhead 02/20/2023 at 05:270 Comments

Installed the lite 64 bit raspian.   The journey begins by enabling a serial port on the 5.  The new dance requires adding these lines to /boot/config.txt

enable_uart=1 dtparam=uart0 dtparam=uart0_console

login: pi password: raspberry doesn't work either. You have to edit /etc/passwd & delete the :x: for the password to get an empty password. Do that for pi & root.

Another new trick is disabling the swap space by removing /var/swap

Then disabling dphys-swapfile the usual way

mv /usr/sbin/dphys-swapfile /usr/sbin/dphys-swapfile.bak

The raspian image is the same for the 5 & the 4 so the same programs should work.  The trick is reinstalling all the dependencies.  They were manely built from source & young lion deleted the source to save space.

Pose estimation on the rasp 4 began in

https://hackaday.io/project/162944/log/202923-faster-pose-tracker

Efficientdet on the rasp 4 began in

https://hackaday.io/project/162944/log/203515-simplified-pose-tracker-dreams

but it has no installation notes.  Compiling an optimized opencv for the architecture was a big problem. 

There were some obsolete notes on:

https://github.com/huzz/OpenCV-aarch64

Download the default branch .zip files for these:

https://github.com/opencv/opencv

https://github.com/opencv/opencv_contrib

There was an obsolete cmake command from huzz. The only change was OPENCV_EXTRA_MODULES_PATH needed the current version number. The apt-get dependencies were reduced to:

apt-get install cmake git
apt-get install python3-dev python3-pip python3-numpy
apt-get install libhdf5-dev

Compilation went a lot faster on the 5 than the 4, despite only 4 gig RAM.

Truckflow requires tensorflow for C.  It had to be cloned from git. 

https://github.com/tensorflow/tensorflow/archive/refs/heads/master.zip

https://www.tensorflow.org/install/source

The C version & python versions require totally different installation processes.  They have notes about installing a python version of tensorflow using the source code, but lions only ever used the source code to compile the C version from scratch & always installed the python version using pip in a virtual environment.

Tensorflow requires the bazel build system.

https://github.com/bazelbuild/bazel/releases

It has to be chmod executable & then moved to /usr/bin/bazel

Inside tensorflow-master you have to run python3 configure.py

Answer no for the clang option.  Then the build command was:

bazel build -c opt //tensorflow/lite:libtensorflowlite.so

This generated all the C dependencies but none of the python dependencies.  There was no rule for installing anything.  The dependencies stayed in ~/.cache/bazel/_bazel_root/  The only required ones were libtensorflowlite.so & headers from flatbuffers/include

To keep things sane, lions made a Makefile rule to install the tensorflow dependencies.

make deps

The lion kingdom's efficientdet-lite tracking program was in:

https://github.com/heroineworshiper/truckcam

The last one was compiled with make truckflow

Then it was run with truckflow.sh

It's behind the times.  It used opencv exclusively instead of compressing JPEG from YUV intermediates.  The phone app no longer worked.  Kind of sad how little lions remembered of its implementation after 2 years.  The phone app was moved to UDP while the server was still TCP.

SetNumThreads now had to be called before any other tflite::Interpreter calls. 

-----------------------------------------------------------------------------------------------------------------------------------------

On the rasp 5, it now runs efficientdet-lite0 at 37fps with SetNumThreads(4) & 18fps with SetNumThreads(1),.  Single threaded mode is double the rasp 4, which makes lions wonder if it was always single threaded before.

It only uses 40% of 3 cores & 100% of 1 core.  It might be more efficient to process 1 tile on each core.

The image is split into 3 tiles regardless of the threading.  If it processes 3 tiles simultaneously with SetNumThreads(1), it goes to 90% on 3 cores & 14.5fps.   If it processes 1 tile at a time with SetNumThreads(4), it goes at 12fps.  Helas, it needs a heatsink & fan. 

Efficientdet-lite1 384x384 goes at 8.8fps. That's as high in resolution as the rasp 5 can  go.  1 core per tile with SetNumThreads(2) makes the frame rate go down to 8.5 & increases the CPU usage to 300%.  It's memory bandwidth constrained.

The current with the full fan, servo, camera, & 3 core efficientdet is 2.6A.  Without the servo, it's 2A.  It's prone to stalling when there's network connection because it's writing to the socket in the scanning thread.

It could probably handle face detection on the 4th core if the camera was better.

It brings back memories of 16 bit tensorrt.  The problems there might have been precision related.  It has overlapping boxes because the tiles overlap.  They don't always overlap.

Discussions