In August 2016 NOAA made the National Water Model operational aimed towards impact-based forecasting of weather and water events. The NWM runs on NOAA’s centralized Weather and Climate Operational Supercomputing System (WCOSS) utilizing the community WRF-Hydro modeling system. It delivers streamflow forecasts for the 2.7 million USGS NHDPlusv2 river reaches as well as 1-kilometer gridded analyses for a range of hydrologic variables across the CONUS.

The nwm package offers methods to access and subset this data in a quick fashion and returns processed data ready for analysis in the R environment.

What are the challenges?

Given the size of CONUS and the resolution of both point and grid data, the sheer size of output is enormous. Each day, approximately 400 GB of data relating to streamflow, terrain and atmospheric processes is produced. This data is stored for a 48 hour rolling window on the NOAA NOMADS server, and for a 40 day rolling window on the HydroShare THREDDS server hosted at RENCI. These outputs come from four unique model configurations that cycle on different intervals, using different forcing data, and generate different output.

The first challenge this package hopes to address is how people can better to understand and interact with the NWM output.

This is addressed with the look function which prints the meta data and description of each model configuration and output type.

The second challenge is how to this data can be queried and subset to an Area of Interest.

The is addressed through a dependency on the AOI package to define an area of interest; the getFilelist function to define the paths to the desired data sets, and the downloadNWM function to get the data and process it for analysis.

The third challenge is how to make these processes fast, efficient, and useful in the R environment.

This is handeled on the backed of the package making use of the parralization capabilities of foreach and doParallel packages along with the subsetting capabilities of the OPeNDAP framework

This document will walk users through the use of this package, but first we need to install nwm!

Challenge #1:

Understanding the configurations, output types, data, units, and nomenclature.

The NWM runs on four unique configurations - each cycling on a different time intervals, producing forecasts out to different lengths, and including varying suites of parameters.

Specifically these are analysis & assimulation, short range, medium range, and long range. To get a better understanding of these configurations users can call on the look() function. Below is an example for the short_range configuration.

look("short_range")
#> You are viewing metadata for the `short_range` configuration:
#> Forced with meteorological data from the HRRR and RAP models, the Short Range Forecast configuration cycles hourly and produces hourly deterministic forecasts of streamflow and hydrologic states out to 18 hours. The model is initialized with a restart file from the Analysis and Assimilation configuration and does not cycle on its own states. 
#> 
#> Valid Types include:
#> channel, reservoir, land, terrain, forcing 
#> 
#> Valid `t` values include:
#> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23
#> 
#> Valid `f` values include:
#> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18

The output of this call is sent to the R console and provides (1) a description of the configuration, (2) the output types offered, (3) the time when these forcasts are made (t), and (4) how far out - or forward - each goes (f).

For example a forecast made at noon for 3PM would have a t value of 12 and an f value of 3.

At the most general level, each model configuration generates three sets of outputs.

Channel output are point values generated at the outlets of each of the 2.7 million NHDPusV2 reaches in the CONUS

land outputs are gridded data set to a 1 km grid

forcing data are those parameters used to generate the channel and land forecasts, these are also 1 km gridded data.

look("medium_range", 'channel')
#> You are viewing metadata for the `medium_range` configuration and `channel` type:
#> The Medium Range Forecast configuration is executed four times per day, is forced with GFS model output and extends out to 10 days. It produces 3-hourly deterministic output and is initialized with the restart file from the Analysis and Assimilation configuration. 
#> 
#> Valid `channel` parameters include:
#>  PARAM        UNITS    DESCRIPTION                             
#>  "streamflow" "m3/sec" "Streamflow"                            
#>  "nudge"      "m3/sec" "Streamflow data assimilation increment"
#>  "velocity"   "m/s"    "Stream velocity"                       
#>  "q_lateral"  "m3/sec" "Channel inflow"                        
#> Valid `t` values include:
#> 0, 6, 12, 18
#> 
#> Valid `f` values include:
#> 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93, 96, 99, 102, 105, 108, 111, 114, 117, 120, 123, 126, 129, 132, 135, 138, 141, 144, 147, 150, 153, 156, 159, 162, 165, 168, 171, 174, 177, 180, 183, 186, 189, 192, 195, 198, 201, 204, 207, 210, 213, 216, 219, 222, 225, 228, 231, 234, 237, 240

Great so now using the look function you have the tools to check the parameters, cycle time and forecast duration of any configuration, type combination - now we can actually get some data!

Challenge #2:

Querying data and subsetting to an AOI

Defining Area of Interest in the nwm package is based on the AOI package which was developed to help find, define, and refine AOI spatial objects. AOIs can be defined by a state, county or clip area generated from a location and bounding box. To learn more about this package please consult the package documentation.

Here is a basic example querying a 100 square mile AOI centered on ‘Colorado Springs, Colorado’. The AOI check() function can be used to view its boundaries:

getAOI(clip = list("Colorado Springs", 10, 10)) %>% check()
#> $AOI
#> class       : SpatialPolygons 
#> features    : 1 
#> extent      : -104.9184, -104.7323, 38.76149, 38.90642  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0+no_defs 
#> 
#> $map

We can use our AOI to subset and download the NHD reaches:

system.time({
  nhd <-getNHD(AOI)
})
#>    user  system elapsed 
#>   0.325   0.019   2.459

head(nhd)
#> class       : SpatialLinesDataFrame 
#> features    : 6 
#> extent      : -104.7981, -104.7656, 38.76338, 38.80435  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs 
#> variables   : 91
#> names       :                          id, ogc_fid,   comid,               fdate, resolution, gnis_id,            gnis_name, lengthkm,      reachcode,        flowdir, wbareacomi,       ftype, fcode, streamleve, streamorde, ... 
#> min values  : nhdflowline_network.1739134, 1739134, 1529805, 1999-06-22 23:00:00,     Medium,  193371, East Fork Sand Creek,    0.348, 11020003000189, With Digitized,          0, StreamRiver, 46003,          3,          4, ... 
#> max values  : nhdflowline_network.1739241, 1739241, 1529851, 2009-07-22 23:00:00,     Medium,  193719,       Fountain Creek,    3.354, 11020003000271, With Digitized,          0, StreamRiver, 46006,          4,          5, ...

Here we see that that there are 91 flow lines (so 91 points) where channel forecasts can be found from the NWM. To view these flow lines we can chain leaflet::addPolylines() to the check(AOI) call. We can also quickly label each river by its COMID using the popup options, and symbolize each reach by its order using weight.


check(AOI)$m %>% addPolylines(data = nhd, popup = paste0("COMID: ", nhd$comid), weight  = nhd$streamorde)

With an area selected, we next need to define the file list we need to subset. This requires defining a model configuration (config), the date(s), the time of forecast(t), and the time forward from each t, (f).

In this example we are getting the medium range, channel, forecasts made on the July 12, 2018 at 12AM, 6AM, and 12PM UTC time.

Remember the look() function can be useful in paramatrizing this query:

files = getFilelist(config = "medium_range", date = "20180712", type = "channel", t = c(0,6,12), f = 3)

print(files)
#> [1] "http://thredds.hydroshare.org/thredds/dodsC/nwm/medium_range/20180712/nwm.t00z.medium_range.channel_rt.f003.conus.nc"
#> [2] "http://thredds.hydroshare.org/thredds/dodsC/nwm/medium_range/20180712/nwm.t06z.medium_range.channel_rt.f003.conus.nc"
#> [3] "http://thredds.hydroshare.org/thredds/dodsC/nwm/medium_range/20180712/nwm.t12z.medium_range.channel_rt.f003.conus.nc"

With a known AOI, and a list of files we can use the downloadNWM function to specifiy the parameter we want to grab from these files subset to the region. In this basic example

xx = system.time({
  flows <-downloadNWM(AOI, files, "streamflow")
})

This operation took 2.357 seconds to extract dim(flows$streamflow)[1] flow recods for the AOI>

Alternitvly we could grab the velocity values from the same file list simple by changing the parameter value to ‘velocity’:

xx = system.time({
  vel <-downloadNWM(AOI, files, "velocity")
})
head(flows$streamflow, 8)
#>    COMIDS            DateTime streamflow
#> 1 1529589 2018-07-12 03:00:00    0.00000
#> 2 1529589 2018-07-12 09:00:00    0.00000
#> 3 1529589 2018-07-12 15:00:00    0.00000
#> 4 1529591 2018-07-12 03:00:00   30.72379
#> 5 1529591 2018-07-12 09:00:00   30.72379
#> 6 1529591 2018-07-12 15:00:00   27.54547
#> 7 1529593 2018-07-12 03:00:00    0.00000
#> 8 1529593 2018-07-12 09:00:00    0.00000
max = flows$streamflow[order(flows$streamflow$streamflow, decreasing = T),]
head(max)
#>      COMIDS            DateTime streamflow
#> 191 1529817 2018-07-12 09:00:00   133.8427
#> 212 1529847 2018-07-12 09:00:00   133.1364
#> 203 1529833 2018-07-12 09:00:00   132.4301
#> 218 1529853 2018-07-12 09:00:00   132.4301
#> 224 1529859 2018-07-12 09:00:00   127.8392
#> 227 1529861 2018-07-12 09:00:00   126.4266
{par(mfrow = c(1,2))
vizFlows(AOI = AOI, data = flows$streamflow, num = 10, max = TRUE)
vizFlows(AOI = AOI, nhd = nhd,  data = flows$streamflow, num = 10, max = FALSE)}

library(RColorBrewer)

files = getFilelist(config = "medium_range", date = "20180711", type = "land", t = 6, f = seq(3,30,3))

system.time({
  d <-downloadNWM(AOI, files, "accet")
})
#>    user  system elapsed 
#>   0.442   0.199   2.951

rasterVis::levelplot(d$accet,
                     main = "National Water Model: Accumulated ET", 
                     names.attr = as.character(nwm::getGridTime(d$accet)),
                     col.regions = colorRampPalette(brewer.pal(9,"YlOrRd")))


check(AOI)$m %>% addRasterImage(x = d$accet[[1]], opacity = .8)

library(ggplot2)
loc = "Garden of the Gods, Colorado Springs"

pt = AOI::geocode(loc, pt = T)

vals = data.frame(date = getGridTime(d$accet), accet = c(raster::extract(d$accet, y =  pt$pt)))
#> Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
#> Raster

plot(x = vals$date, 
     y = vals$accet, 
     type = 'o', 
     pch = 16, 
     ylab = "ET", 
     xlab = "Time (UTC)",
     main = loc)