Close

Training an efficientdet_lite0 model

A project log for Raspberry pi tracking cam

Tracking animals on lower speed boards to replace jetson & coral

lion-mclionheadlion mclionhead 02/13/2022 at 08:220 Comments

The journey began with downloading a new dataset from the goog.

https://voxel51.com/docs/fiftyone/tutorials/open_images.html

For some reason, the data set is intended to be downloaded & viewed by running commands from the python console.  Helas, it was a bit convoluted & bloated compared to COCO's category ID's.  It would be easier to just convert COCO to the right XML format.

A new truckcam/coco_to_tflow.py script converted the annotations.

Then it was a matter of converting

https://github.com/freedomwebtech/tensorflow-lite-custom-object/blob/main/Model_Maker_Object_Detection.ipynb

into a big model making script: truckcam/model_maker.py

The 1st problem was getting tensorflow to use the GPU.  Verify GPU detection with:

source yolov5/YoloV5_VirEnv/bin/activate

LD_LIBRARY_PATH=/usr/local/cuda-11.2/targets/x86_64-linux/lib/ python3

import tensorflow as tf

print(tf.__version__)

print(tf.config.list_physical_devices())

This normally fails with libcudart.so.11.0 & libcudnn.so.8 not being found.

The command which works is to install cudnn from

https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html

Get the version from the archive which matches the version of CUDA.

The next problem was unlike pytorch, tensorflow doesn't store the best model & stop training after it hits the best model.  You have to review the training printfs & find where val_loss stops decreasing.  Then retrain with a different number of epochs.

Finally, if the batch size is too big it'll crash after training is complete.  Pytorch would crash before training began.

The model maker doesn't automatically generate any test images with labels, but the model does work when dropped into the example from https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/raspberry_pi

python3 detect.py --model=model.tflite

50 epochs with 1000 images gave a fail.

300 epochs with 1000 images arguably gave better results.  It's arguably only slightly worse than openpose at detecting fake lions & arguably comparable to face detection.  The score can be tweeked to make it more selective.   It's definitely better than the stock efficientdet_lite0 model.

Some other ideas are trying the larger efficientdet models with overclocking or on the odroid, trying more images, using video of just lions.

It runs 8x faster on the raspberry pi than software mode on a Core(TM) i7-6700HQ.  No-one is bothering to optimize tensorflow for Intel anymore.  The lion kingdom doesn't think Intel should be underestimated, since they're the only ones who have made any chips since 2020.

Discussions