tomato, garden, vine

Yes, AI improves farm yield!

The field of precision agriculture pursues lower inputs and higher yields. Machine learning is becoming an important decision support tool for farmers including selecting plants to grow, in what conditions, and what to do to maximize yield.


Tomato producers invest money in optimizing their crop yield (and revenue) by surface unit. The traditional way of doing it is to test different hybrids of plants, combined with different spacing, area of land, time of planting, treatment, watering, fertilizer, etc.

Each test costs $ and takes time!

The goal was to predict the yield of different hybrids of tomato plants to understand their potential for production as a function of time at different sites. Machine learning was used to create a multivariate model where the yield of a plant can be predicted as a function of site, hybrid, and time and predict the most likely growth curve. The models can also be trained to predict size categories.


Two classes of tomatoes [pear tomatoes and round tomatoes] were tested at each site. For pear tomatoes, two hybrids were planted, while for round tomatoes, 4 hybrids were planted in one site and 3 in the other. The results were recorded as the total weight of tomato harvested at a certain time for a number of plants. Those weights are also broken down in weight categories (XXL, XL, L, etc).

The practical meaning to these parameters were discussed with the farm operator for design of new experiments as well as the forecasting of yield for investment evaluation.

Image: Plant yield dataset

The modeling technique involved arranging the data into a research format. Braintoy mlOS was then used to develop predictive models, deriving the parameters, scoring new data and deploying the models as real-time APIs.

CSV file of reorganized data set
CSV file of reorganized data set as viewed on a desktop


Despite the small amount of data, curves produced by the models mimicked the experimental data with ~5% difference.

Out of the few dozen algorithms, the RandomForest gave good results.

A curve for the missing experiment was produced satisfactorily, accurately predicting the yield.

The Scored Data from mlOS was exported to PowerBI, a visualization tool.

Plant Yield Results

Image: Plant Yield Results

The practical meaning to the parameters of the data were discussed again for investment evaluation. The models were then improved to predict better.

Plant Yield Results Improved
Image: Improved Plant Yield Results


Plants and animals are good test subjects as they show predictable natural behaviour. Machine learning, when applied to natural domains gives good results!


Diego Raffa

Diego Raffa is a Data Scientist with a background in energy, academics, and research. Trained as a chemist, he researched after Ph.D. in the fields of chemistry and physics. He took his expertise to the energy industry where he became a leader in simulation and optimization of petroleum reservoirs. His work required dealing with big data which then took him to the world of data science. Diego believes that knowledge grows when shared. He is passionate about applying AI and ML to solve problems in diverse industries.