Ground Reference Data Collection Guide for Machine Learning Applications

All AI and machine learning (ML) applications on Earth observation require quality ground reference data, which are accurate observations of features on the ground to use as a label of what an overhead image (e.g., satellite remote sensing data) represents. This is especially true for applications in agriculture. Inconsistent, mislabeled, or inaccurately collected ground reference data in crop fields, for example, can lead to misclassification of crop types and biased productivity estimates.

For this reason, in October 2019, we partnered with Radiant Earth Foundation to jointly convene a workshop during the 2019 CGIAR Big Data Convention. We discussed the status of georeferenced data collection in CGIAR and reviewed Radiant Earth’s draft data collection guideline. Following the workshop, we, as a community, agreed to promote the use of the guideline in CGIAR.

To collect more feedback from a broader geospatial community, we jointly hosted a webinar on April 21, 2020, where Hamed Alemohammadto (Chief Data Scientist at Radiant Earth Foundation) introduced the latest version of the guideline, followed by Kai Sonder (CIMMYT)‘s presentation that introduced three recent examples from ICRISAT, IFPRI, and CIMMYT that used field-collected ground-referenced data to analyze crop typescrop phenology, and crop breeding trials at scale. Kai also highlighted the importance of adopting this guideline to improve the value of field-collected data in research.

The latest version of the guideline for review and feedback can be found at https://github.com/radiantearth/ground-referencing-guide.