Data Science in CGIAR
At the nexus of food, land, and water systems, we often deal with a large volume of complex data in agricultural research. Our knowledge in the food systems domain is very important in understanding the context and interpreting the analysis results. But our usual research tools and existing data may not be enough to drive systems transformations and address the climate crisis. CGIAR scientists increasingly need to code their own computer programs to analyze the data using advanced analytical methods, often machine learning approaches, visualize complex relationships, and share the analysis across disciplines.
For the 2020 CGIAR Virtual Convention for Big Data in Agriculture, we conducted an online survey to collect CGIAR’s machine learning use-cases, where we asked what kinds of analytical methods they use, to analyze what kinds of data, for what purposes. The above figure is a visual summary of survey responses from all CGIAR Centers.
First, you can see there are a lot of different analytical methods being used, from random forest, regression, classification, and neural nets. Our colleagues said they need to use these methods to analyze large volumes of data from satellite remote sensing, household surveys, agronomic trials, and soil and weather. This is probably not surprising. As there are many new, diverse data sources, researchers are also experimenting with new analytical methods to analyze them, find patterns, and develop new insights. What was really interesting from this survey was the diversity of applications. Altogether, we identified more than hundred use-cases, from crop yield prediction, pest monitoring, and modeling farmers’ behavior under climate change. The data science approach is already being widely used to answer complex research questions, and our scientists realize the potential of big data in agriculture. To keep the momentum, also to keep up with new techniques and tools, we will need to continuously explore new approaches and learn from more pilots and use-cases.
Launching of CGIAR-Coursera Data Science Academy
One of the core functions of Communities of Practice is capacity building. We organized many on-site training workshops in the past on different technical topics. As we are not able to convene such in-person workshops anymore under the pandemic, we adjust our plan to provide an online capacity building opportunity. In partnership with Coursera, CGIAR Platform for Big Data in Agriculture and CGIAR-CSI jointly launched a new pilot program, CGIAR Data Science Academy, in October 2020.
Based on a set of pre-announced selection criteria (e.g., participation at the 2020 Convention, data science challenges, and CoP activities, as well as Centers’ recommendations), we selected 25 CGIAR scientists to participate in this pilot and enrolled them to join Coursera’s Data Science Academy.
Under this program, the trainees can join any courses from Coursera’s vast catalog, share their learning with cohort, and participate in guided projects to use the skills in real-world use-cases. Since the launch, collectively, our Cohort has already completed 422 lessons in 49 courses. Responses so far have been overwhelmingly positive. The most popular courses are from the curated Data Science topics, like “Deep Learning with TensorFlow,” “AI in Production,” and “Statistics for Machine Learning,” but they are also encouraged to take other popular courses covering topics of individual interests as well, such as “Learning How to Learn,” “Enjoyable Econometrics,” and “Music Production.” During the recent cohort meeting, trainees excitedly shared comments like “I am happy to finally take time to learn much needed Python programming skills,” “I enjoy learning outside of my usual comfort zone,” and “I happily traded my evening Netflix time with Coursera!”
Let us know if you’re interested in participating in this program as a future cohort.