Number of images used in the training

In order to detect uncommon and original objects such as pylon(traffic cone), it is necessary to create custom models that are trained on their own. Roktrack uses yolov8 (nano model) to create custom models. The number of images for each class used for training is as follows

Model for mowing navigation

Pylon	7000
Person	13000
Roktrack	1500

After training, I am exporting in onnx format with 320*320 and 640*640 image_size. The reason why I am exporting with two image_sizes is because of the difference in the time required for each inference. The former takes about 1 second on the Rasberry Pi 3A+, while the latter takes a little over 3 seconds. During actual mowing, I use a light model while the pylon is recognizable, and if it is lost, I use a heavy model to be able to detect objects in the distance. In my experiments, we were able to detect a pylon 50m away when using the 1280*1280 model; it takes about 10 seconds to infer one image, so it is impractical to use it for navigation while moving.

Model for number recognition

0	200
1	200
2	200
3	200
4	200
5	200
6	200
7	200
8	200
9	200

The number recognition model is exported with an image size of 96*96 to speed up processing. As explained in previous log, this model infers on cropped images, so low resolution is not a problem.

Models for Animal Detection

Bear	3000
Deer	3000
Monkey	1000
Raccoon	1000
Fox	1000
Dog	1000
Cat	1000
Civet	1000
Boar	1000
Hare	1000
Badger	1000

This is also exported in 320*320 and 640*640 size, but it leaves some false positives. The nano model can not capture the characteristics of each class; a larger model, such as small, might be better.

I feel that at least 1,500 images per class, preferably 3,000, are needed to achieve satisfactory accuracy.

Model for mowing navigation

Model for number recognition

Models for Animal Detection

Worked with Raspberry Pi Zero 2W!

Don't shoot while turning!

Discussions

Become a Hackaday.io Member