Approximately 9.2% of the world’s population faces hunger. Our team at ADC has conducted research on how data science can play a role in this highly impactful topic. In collaboration with Zero Hunger Lab, our Agriculture & Food Practice worked on enhancing forecasts on food insecurity. Using data from satellite imagery and utilising semi-supervised machine learning techniques, we have found promising evidence of an increase in model performance.
Can Machine Learning (ML) techniques address food insecurity?
Food insecurity is a critical global issue affecting millions, particularly in developing countries. Adequate access to safe and nutritious food is essential for individuals to lead healthy and productive lives. Unfortunately, numerous regions still face significant challenges in ensuring food security for their populations. Kenya is one of those regions that grapples with persistent food insecurity.
In the quest to combat hunger and improve food security around the world, advancements in technology are playing a crucial role. One such advancement is utilising machine learning techniques to enhance food security forecasts. If accurate forecasts can be provided months in advance, policy makers and food aid organisations can plan their help accordingly. In this way the disastrous effects of food crises can be minimised.
In collaboration with Zero Hunger Lab, a research institute from the University of Tilburg, we have researched different techniques to boost the performance of existing models. The most prominent performance boost came by including semi-supervised machine learning algorithms. By leveraging both labeled and unlabeled data, this approach has shown promising results in accurately predicting food security status in Kenya’s diverse districts.
Using explanatory variables to enhance IPC phase prediction
The Integrated Phase Classification (IPC) system is a widely used framework for assessing and classifying food security conditions. It categorises districts into five different phases based on their food security status, ranging from minimal to catastrophic. The IPC reports are generated only every three to four months, which is not ideal since our explanatory variables are available on a much wider time scale.
To predict the IPC phase of a district, we have looked at explanatory variables in line with the three pillars of food security: food availability, food accessibility (economic and physical), and food stability. For example, we have extracted data from satellite imagery to account for food production, mapped economic conditions and infrastructure for accessibility, and used data on (violent) conflicts for stability.
Employing semi-supervised machine learning
To make predictions using the large dataset we wanted to apply machine learning (ML), a subtopic of AI. We can divide machine learning into two types: supervised and unsupervised ML.
Firstly, supervised ML involves labeled datasets, observations (months) for which we know the state of food security. Whereas unsupervised machine learning tries to find patterns and structure on an unlabeled dataset; hence we do not necessarily need information on the state of food security. However, this lack of information can impact the performance. Lastly, there exists a combination between the two: semi-supervised machine learning, which utilises both labeled and unlabeled data.
As stated previously, IPC reports are issued every three to four months. However, our explanatory variables span a much broader time frame. Consequently, while the data is tagged during the months of its release, the intervening months remain untagged. Disregarding the information from these unlabeled data points and concentrating solely on the labeled data would be a missed opportunity. Therefore, our objective is to merge both labeled and unlabeled data by employing a semi-supervised machine learning approach.
A positive outlook on food insecurity forecasting
By incorporating the information from the months without IPC scores, the predictive models demonstrated a noticeable increase in performance. The F1 score, a balanced metric between precision and recall, of the forecasts saw substantial improvements compared to models trained solely on labeled data. This enhancement in predictive power has significant implications for decision-makers, aid organisations, and policymakers involved in food security planning and interventions.
As the volume of available data continues to grow, we advise researchers and institutions not to rely solely on conventional supervised models. Instead, they should harness the potential of unlabeled data through semi-supervised machine learning techniques. This additional data has the potential to provide the model with valuable insights, enhancing its forecasting capabilities.
Next to research, the findings can be directly applied to client projects. For example, we can help the agriculture sector to leverage semi-supervised techniques to optimize crop planning and distribution strategies. As we continue to explore new technologies, projects like this remind us of the positive impact that data science can have in addressing social issues and improving the welfare of communities and nations.