Data Science Evaluation in Ecology

Sergio Marconi, 12 June 2018

Study biodiversity across scales is an hard task, and most likely data collected on the ground are not enough. Satellite and airborne images potentially provide continuous information about patterns and processes driving ecological systems. Using those data allow forests to be studied in detail at much larger scales than is currently possible. The problem, however, is that we are still far from knowing how to “read” remote sensing information. Hundreds of studies tried to develop such methods, but singularly on different datasets, making it hard to cross compare their relative efficiency.

Collaborative data analysis challenges are effective instruments to improve methods for converting data to useful information. Despite this practice is fairly common in other data driven sciences (e.g. www.kaggle.com), these are far from common in Ecology and, as a result, most ecologists are unaware of, and have had few opportunities to participate in, data science competitions. With this perspective we developed a Data Science Evaluation in the framework of the National Institute of Standards and Technology (NIST) Data Science Evaluation Series (DSE). We used remote sensing data from NEON Airborne Observatory Platform, and data collected from the field by NEON Terrestrial Observatory Platform and our group: we came out with an unique standard dataset, publicly available at (https://zenodo.org/record/1206101#.W8JJLKeZO_I)

We identified three sets of tasks to begin with: 1) Segmentation: Identifying individual trees in remote sensing images; 2) Alignment: Aligning ground data with remote sensing data; and 3) Classification: Classifying trees into species.

Teams could participate in all of them or just for the tasks they were most interested in. Details of the different tasks and links to the data are available at the challenge website: https://www.ecodse.org. The pilot round was received with great excitement and in a very collaborative way from all participants, who agreed to participate to a collection of papers and have their code publicly available.

We are counting to run this "competition" annually; next round is expected to open in January 2019, and we will provide data from more different NEON forested sites. If you are interested, sign at https://www.ecodse.org. Once you sign up on the website you will receive an email with some additional details. If you have any questions feel free to respond to that email or checkout the FAQ to see if they have already been answered.

This challenge is sponsored by the National Institute of Standards Technology as part of it’s Data Science Evaluation series and is also partially supported by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through grant GBMF4563. It uses data from the National Ecological Observatory Network in addition to data collected by the organizers. The challenge is being organized by the Data Science Research lab, the Weecology lab, and Stephanie Bohlman’s lab all at the University of Florida.

References

Marconi S, Graves SJ, Gong D, Nia MS, Le Bras M, Dorr BJ, Fontana P, Gearhart J, Greenberg C, Harris DJ, Kumar SA, Nishant A, Prarabdh J, Rege SU, Bohlman SA, White EP, Wang DZ. (2018) A data science challenge for converting airborne remote sensing data into ecological information. PeerJ Preprints 6:e26966v1 https://doi.org/10.7287/peerj.preprints.26966v1