visit
Agenda:
1. Business needs
2. Data preparation
3. Model structure
4. Used libs and tools
5. Results
6. Error analysis
7. Fails/Hypotheses
8. Conclusion
9. References
1. Business needsFortunately for me, I am working at . Where I am developing Computer Vision (CV) and other models to solve business problems and challenges. One of them is to count seeds on sunflower.
Kernel – the world’s leading and the largest in Ukraine producer and
exporter of sunflower oil, and major supplier of agricultural products
from the Black Sea region to world markets. Kernel exports its products
into more than 80 countries of the world.
It is a common task for agronomist to count seeds on sunflower and corn, as these calculations will be used to predict "biologycal harvest". Separately, agronomists calculate weight of 1000 kernels, which will further be used to estimate the overall yield per area of field.
Moreover, yield prediction is the primary function of agro analytics. The faster and more precise they make a forecast, the more money the company will earn.
These inspections should be carried out for every filed. And this is how it works now:
1. Agronomists go to the field
2. Take about 10 sunflowers from different parts of the field
3. Split every sunflower into 4 quarters
4. Count the seeds and multiply them by 4 to find the total number of seeds on every sunflower
5. Take random 1000 seeds and weight them
5. We get the density of plants on the field per hectare from other inspections (or use planned density), btw we also will use AI to solve this task, and this could be topic for next post
6. And calculate (density per ha * avg seed quantity on sunflower * weight of 1000) / 1000 = yield weight per ha
2. Data preparationTo solve any Machine Learning problem, we need to get data and train the model. And for every specific task you cant find any free available labeled dataset. In our case, the dataset should contain hundreds of sunflower photos with label for every seed related to right class (black or white, for easy understanding), so we gathered our own dataset.
Our agronomists have made about 1000 photos of different sunflower hybrids from different regions of Ukraine. And subsequently, another team labeled photos in free tool for data labeling. There should be bounding_box (purple box) for each image, as well as black seeds (green points in the picture below), and white seeds (red points in the example below).
looks normal with mean value - 1271.
And correlation between number of black and white seeds.
3. Model structure
For business needs, all calculations should be done on a mobile device. That's why I used U-net with mobilenet_v2 encoder pretrained on . What does this all means?
U-net is Neural Net structure where NN takes an input image and returns a segmentation map (in our case heatmap), with downsampling and upsampling data.
Our heatmap represents probability that pixel is related to some kernel.
Frankly speaking, for this task I trained 2 NN and 1 algorithm.
The first NN cut the sunflower from the image, like this:
The second NN build 2 heatmaps, one per class (black and white).
For training, input for this net is cropped image (from previous NN), and target is heatmap with builded gaussians on labeled points.
For black kernels it looks like this:
4. Used libs and tools
For training, I used Python libs such as:
PyTorch
and for training modelsalbumentations
- for image augmentationsegmentation_models_pytorch
- to use pretrained modelsscikit_image
- to work with blobs on heatmaphyperopt
- to tune hyperparams for blob detectionwandb
- to track training experiments5. Results
How do we validate the whole solution?
To calculate mean absolute percentage error (MAPE) I made prediction for every image and compared the number of found seeds by model and number of seeds found by assessors. As a result, we got normal distribution with mean 0, which is extremely good.
Blue bars - are for black kernels and MAPE is 3.6%,
Orange bars - are for white kernels and MAPE is 8.8%.
6. Error analysisThe biggest errors will have unusual sunflowers, that have with abnormally large number of kernels, or too many grey kernels, or with rounded edges, as it can be seen in the example below:
7. Fails/HypothesesOnce, I have found mathematically explained seeds positions in sunflowers and some other plants with fractal structure. Positions could be explained by which has an equation in polar coordinates
where
n
is index number of kernel and c
- constant, I guess for every sunflower it is different. As result, we should get the following positionsBut in practice it is doesn't work. I have modified that equations and added new constants
a, b
where blue - dot is
(0, 0)
point. So it is close in some way but not applicable for to real case.8. Conclusion
Currently, this task has been speeded up and standardized in our company in order to exclude human factor from counting. The MAPE is acceptable for business needs and this solution is working in mobile application now.
For me, it is still unbelievable that we can count more than 1000 kernels, separate points into 2 different classes and all of this can be done via mobile device.
9. Referances
1. (2015)
2. (2019)