Close

Number of images used in the training

A project log for Roktrack - Pylon Guided Mower

A mower not only for your yard, but also for your community.

yuta-suitoYuta Suito 10/09/2023 at 06:140 Comments

In order to detect uncommon and original objects such as pylon(traffic cone), it is necessary to create custom models that are trained on their own. Roktrack uses yolov8 (nano model) to create custom models. The number of images for each class used for training is as follows

Model for mowing navigation

Pylon7000
Person13000
Roktrack1500

After training, I am exporting in onnx format with 320*320 and 640*640 image_size. The reason why I am exporting with two image_sizes is because of the difference in the time required for each inference. The former takes about 1 second on the Rasberry Pi 3A+, while the latter takes a little over 3 seconds. During actual mowing, I use a light model while the pylon is recognizable, and if it is lost, I use a heavy model to be able to detect objects in the distance. In my experiments, we were able to detect a pylon 50m away when using the 1280*1280 model; it takes about 10 seconds to infer one image, so it is impractical to use it for navigation while moving.

Model for number recognition

0200
1200
2200
3200
4200
5200
6200
7200
8200
9200

The number recognition model is exported with an image size of 96*96 to speed up processing. As explained in previous log, this model infers on cropped images, so low resolution is not a problem.

Models for Animal Detection

Bear3000
Deer3000
Monkey1000
Raccoon1000
Fox1000
Dog1000
Cat
1000
Civet1000
Boar1000
Hare1000
Badger1000

This is also exported in 320*320 and 640*640 size, but it leaves some false positives. The nano model can not capture the characteristics of each class; a larger model, such as small, might be better.

I feel that at least 1,500 images per class, preferably 3,000, are needed to achieve satisfactory accuracy.

Discussions