Localization of ripe tomato bunch using deep neural networks and class activation mapping

Seung-Woo Kang1, Soo-Hyun Cho1, Dae-Hyun Lee1,*, Kyung-Chul Kim2

1Department of Biosystems Machinery Engineering, Chungnam National University, Daejeon 34134, Korea
2Department of Agricultural Engineering, National Institute of Agricultural Sciences, Jeonju 54875, Korea

*Corresponding author:


In this study, we propose a ripe tomato bunch localization method based on convolutional neural networks, to be applied in robotic harvesting systems. Tomato images were obtained from a smart greenhouse at the Rural Development Administration (RDA). The sample images for training were extracted based on tomato maturity and resized to 128 × 128 pixels for use in the classification model. The model was constructed based on four-layer convolutional neural networks, and the classes were determined based on stage of maturity, using a Softmax classifier. The localization of the ripe tomato bunch region was indicated on a class activation map. The class activation map could show the approximate location of the tomato bunch but tends to present a local part or a large part of the ripe tomato bunch region, which could lead to poor performance. Therefore, we suggest a recursive method to improve the performance of the model. The classification results indicated that the accuracy, precision, recall, and F1-score were 0.98, 0.87, 0.98, and 0.92, respectively. The localization performance was 0.52, estimated by the Intersection over Union (IoU), and through input recursion, the IoU was improved by 13%. Based on the results, the proposed localization of the ripe tomato bunch area can be incorporated in robotic harvesting systems to establish the optimal harvesting paths.


convolutional neural networks, localization, robot harvesting, tomato bunch

More details can be read in the PDF