tomato, garden, vine

Yes, AI improves farm yield!

The field of precision agriculture pursues lower inputs and higher yields. Machine learning is becoming an important decision support tool for farmers including selecting plants to grow, in what conditions, and what to do to maximize yield.


Tomato producers invest money in optimizing their crop yield by surface unit. The traditional way of doing it is testing different hybrids of plants, combined with different spacing, area of land, time of planting, treatment, watering, etc.

Each test costs $ and takes time!

The goal was to predict the yield of different hybrids of tomato plants to understand their potential for production as a function of time at different sites. Machine learning will be used to create a multivariate model where the yield of a plant can be predicted as a function of site, hybrid, and time and predict the “most likely” growth curve. The models can also be trained to be used for specific size categories.


Two classes of tomatoes [pear tomatoes and round tomatoes] were tested at each site. For pear tomatoes, two hybrids were planted, while for round tomatoes, 4 hybrids were planted in one site and 3 in the other. The results are recorded as the total weight of tomato harvested at a certain time for a number of plants. Those weights are also broken down in weight categories (XXL, XL, L, etc).

The practical meaning to these parameters were discussed with the farm operator to guide the design of new experiments as well as the forecasting of yield for investment evaluation.

Image: Plant yield dataset


The modeling technique involved rearranging the data into a ‘research format’, and then Braintoy mlOS was used to develop predictive models, deriving the parameters, scoring new data and deploying the models.

CSV file of reorganized data set
CSV file of reorganized data set as viewed on a desktop



Despite the small amount of data, curves produced by the models mimicked the experimental data with ~5% difference. Out of the few dozen algorithms, the RandomForest gave good results.

A curve for the missing experiment was produced satisfactorily, accurately predicting the yield.

The Scored Data from mlOS was exported to PowerBI, a popular visualization tool.

Plant Yield Results

Image: Plant Yield Results

The practical meaning to the parameters of the data were discussed again forecasting of yield for investment evaluation. The model was then improved to predict better.

Plant Yield Results Improved
Image: Improved Plant Yield Results



Plants and animals are good test subjects as they show a predictable natural behaviour. Machine learning, when applied to natural domains gives good results!


Diego Raffa

Diego Raffa is a Data Scientist with a background in energy, academics, and research. Trained as a chemist, he researched after Ph.D. in the fields of chemistry and physics. He took his expertise to the energy industry where he became a leader in simulation and optimization of petroleum reservoirs. His work required dealing with big data which then took him to the world of data science. Diego believes that knowledge grows when shared. He is passionate about applying AI and ML to solve problems in diverse industries.

Start modeling now


Attend a training