It Takes a Community

Geospatial Data Harmonization
Jawoo giving a presentation on CGIAR’s geospatial data management at IFLRC-VII

What’s so special about geospatial data?

The seventh International Food Legume Research Conference (IFLRC-VII) was held in Marrakech, Morocco, during 6-8 May 2018. ICRISAT organized a workshop and special session on the Harmonization of Data Management Systems in CGIAR. I was invited to represent CGIAR-CSI to present a case on the geospatial data management, along with other participants presenting the cases of breeding data, socioeconomic data, GEMS (genomics, environment, management, and socioeconomic), and big data.

For the geospatial data and how we manage them, I made a few remarks in the presentation, listing some unique and special challenges around our work with geospatial data and how we are addressing these as a community. In addition to usual things (e.g., dealing with large data files, unfamiliar data formats, eternal temptation to make tools rather than investing in data science), I made a point about our unique need for the community effort in the geospatial science, and I’d like to reiterate it here.

It takes a community

Over the years, I came to believe that the geospatial science is probably the discipline where the community-level effort is needed the most. Why? Because, even more than other disciplines in agriculture, we consistently need insiders’ knowledge to make progress. We can’t have all the data we need in our hands. Finding the data is usually the first step, and open access/open data is certainly helping. Through the academic publications and technical documentation on the data, we then learn what is behind the pixels and polygons, but usually, this is not enough. Often times, what you see in the geospatial data is not what you will actually see on the ground. Large datasets, even remote sensing-based datasets with plausibly rendered maps and convincingly visualized charts, are developed using multitudes of interpolation, predictions, gap-filling, proxies, classifications with thresholds, and so-called massaging (yes, it’s a technical term). You will always have to be very much attentive to these details before using the dataset. Due to the nature of our work, complete description of all the methodology, assumptions we make, and large-scale verification of data quality through the conventional way of the peer-review process is difficult (if not impossible). Hence, the best way to find what’d be the best choice for the specific need and to advance your product’s applicability and overall quality is to consult with your spatial colleagues – who knows the best, who have used it already, and who might have already figured the strength/weakness of possible options. And that’s where we, CGIAR-CSI, can be a valuable resource, and that’s why we exist.

Join the community!

As of writing, we have about 120 members in the CGIAR-CSI community across all 15 centers (plus ITC). In addition to being the network, there are many activities we organize and develop together to accelerate our members’ work. For example, in 2018, through the generous supports from CGIAR Platform for Big Data in Agricultre, we will be able to organize a series of training workshops for strengthening our programming skills (through which we will develop training materials), provide shared-services on key datasets and analytics tools, provide our partners with mini-grants to facilitate the development of key open geospatial datasets, and organize mapping competitions to help advance the quality of crop distribution maps. Not to lose any of these opportunities, be sure to join the community!