A one-platform approach to processing citizen science data with climateR
Nayoung Hur
LynkerSource:
vignettes/mros-climateR.Rmd
mros-climateR.Rmd
Rain or snow?
Mountain Rain or Snow is a citizen science project with the goal to better predict precipitation phase funded by NASA’s Citizen Science for Earth Systems Program (CSESP). Citizen scientists across the Continental United States respond to ‘What is falling from the sky?’ by reporting precipitation observations as rain, mixed, or snow using a mobile app. Over several years, the project has collected around 40,000 observations (Figure 1). This observation is easy for human observers to make but challenging for atmospheric models to predict. This valuable dataset provides a basis to improve such models.
A data assimilation challenge
Each observation point requires rigorous data processing to eventually model outputs for the air, dew point, wet-bulb temperatures, and relative humidity value at the observation point. Additionally, as part of the project’s mission to improve satellite-based algorithms, each observation point has an associated probability of liquid precipitation (pLP) from IMERG, a gridded NASA product.
The raw outputs for each observation are a timestamp, location of the report (latitude/longitude), and the reported precipitation phase. Ancillary information like elevation and station data from meteorological networks near the observation are critical inputs for the temperature modeling.
Previous processing workflows collected data from various platforms and providers, and then brought into R for further analysis. This meant accessing elevation and pLP data via an external platform for the observation point, then exporting those data to bind with a dataframe. This process required intermediate file storage and maintaining code unique to each data provider.
To ensure workflow reproducibility and simplify the processing chain, the Mountain Rain or Snow team integrated climateR to organize this workflow. climateR has a large (and growing) catalog of data providers. A benefit of this approach is that it allows for future changes in the processing of phase observations. New data products or model output can be quickly subset and included without adding additional dependencies to the codebase or writing code to robustly access large, gridded files. All processing is now kept in a single language (R), with seamless retrieval for elevation and pLP data from external providers and integration with the original dataframe. A summary of the process is illustrated in Figure 2.
More about the data
Elevation data for an observation point is extracted from the USGS
3DEP 1/3 arc-second (10-meter) dataset, the highest resolution USGS DEM
available (see get3DEP
function). The IMERG pLP data are accessed in through the dap
function, which allows for consistent data retrieval from NASA’s Goddard
Earth Sciences Data and Information Services Center (GES DISC).
Both functions from climateR provide solutions that may be unique to the Mountain Rain or Snow project, but integrating multiple data products is a common problem that many projects face. Adding to the issue, reproducibility is core to practicing good science, and ordered workflows are often difficult to establish. A single platform approach to data collection and processing with climateR solves these issues.
Acknowledgments
Thank you Dillon Ragar (Lynker) and Rachel Bash (Lynker) for their review on this article.
As mentioned in the article, Mountain Rain or Snow is funded by NASA’s Citizen Science for Earth Systems Program. Co-PIs for Mountain Rain or Snow are Dr. Keith Jennings (Lynker), Meghan Collins (DRI, UNR), and Dr. Monica Arienzo (DRI, UNR).