With a weight grid, zonal metrics can be computed. The four primary approaches use slightly different processes:
exactextractr leverages C libraries and an in memory
raster and sf object. It works polygon-by-polygon to compute coverage’s
and the weight table is computed within the function.intersectr utilizes NetCDF filepath and calculates all
polygons timestep-by-timestep usingdata.table. A weight
grid must be supplied.zonal works from NetCDF or tif filepath and calculates
all polygons and all time simultaneously using data.table.
A weight grid must also be supplied.The performance and comparison of these three approaches are shown below when the domain is large, and when (A) there a many thousands of polygons, and (B) when there a a few large polygon aggregation units.
The intersectr workflow for defining inputs for
execute_intersection are wrapping into a prep function
below:
intersectr_prep = function(file, geom, ID, variable){
  nc_coord_vars <- nc_coord_var(file)
  nc_coord_vars <- filter(nc_coord_vars, variable == !!variable)
  
  nc       <- open.nc(file)
  X_coords <- var.get.nc(nc, nc_coord_vars$X, unpack = TRUE)
  Y_coords <- var.get.nc(nc, nc_coord_vars$Y, unpack = TRUE)
  
  nc_prj <- nc_gm_to_prj(nc_grid_mapping_atts(file))
    
  cell_geometry = create_cell_geometry(X_coords = X_coords,
                         Y_coords = Y_coords,
                         prj = nc_prj,
                         geom = geom, 
                         buffer_dist = 0.1, # Degrees
                         regularize = TRUE)
    
  data_source_cells <- st_sf(dplyr::select(cell_geometry, grid_ids))
  target_polygons   <- st_sf(dplyr::select(geom, !!ID))
  st_agr(data_source_cells) <- "constant"
  st_agr(target_polygons)   <- "constant"
  area_weights = calculate_area_intersection_weights(data_source_cells, target_polygons, allow_lonlat = TRUE)
  
  return(list(grid = cell_geometry, w = area_weights, x = nc_coord_vars$X, y = nc_coord_vars$Y, t = nc_coord_vars$T))
}The exacextract workflow for computing aggregate means
for a raster stack are wrapped below:
exactrextract_process = function(file, geom, ID){
  R.utils::withTimeout(
    exactextractr::exact_extract(raster::stack(file), 
                                 geom, 
                                 stack_apply = TRUE, 
                                 fun = "mean", 
                                 append_cols = ID,
                                 progress = FALSE),
  timeout = 180, onTimeout = "silent")
}Spoiler Alert: This method can take an extremely long time when the polygon count is very high. As such, we are limiting the execution time to 180 seconds (three minutes). If a benchmark time indicates the process takes 180 seconds, it means the process was killed and not completed.
The zonal workflow for building a weight grid and
executing the areal averages can be executed with
execute_zonal.
library(zonal)The gridded data and aggregate units we are working with can be seen below:
file = 'pet_2020.nc'
(s = terra::rast(file))## class       : SpatRaster 
## dimensions  : 585, 1386, 366  (nrow, ncol, nlyr)
## resolution  : 0.04167, 0.04167  (x, y)
## extent      : -124.8, -67.04, 25.05, 49.42  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : pet_2020.nc 
## varname     : potential_evapotranspiration (pet) 
## names       : poten~43829, poten~43830, poten~43831, poten~43832, poten~43833, poten~43834, ... 
## unit        :          mm,          mm,          mm,          mm,          mm,          mm, ...
Looking at the grid we can see in consists of 810810 grid cells each with a 0.0417 meter by 0.0417 meter resolution. Additionally, there are 366 unique time slices in the NetCDF file.
Our first example uses a hydrofabric developed for the Northeast USA.
geom <- read_sf('hydrofabric.gpkg', "catchments")
paint(geom)## sf [18041, 4] 
## active geometry column: geom (POLYGON)
## crs: 5070 (NAD83 / Conus Albers)
## crs unit: metre 
## ID        chr cat-1 cat-2 cat-4 cat-5 cat-6 cat-7
## area_sqkm dbl 12.457576 267.083595 8.319214 9.278138 60.577~
## toID      chr nex-2 nex-3 nex-5 nex-6 nex-7 nex-8
## geom      sfc POLY 2,024B POLY 9,064B POLY 1,656B POLY 1,81~
In total we have 18,041 aggregation units to summarize over the 366 time steps. Both zonal and intersectr are designed to precompute a weight grid. Therefore we time how long it takes to do this using each method:
int_time_huc01 = system.time({
  intersectr_input = intersectr_prep(file, 
                                     geom, 
                                     ID = "ID", 
                                     variable = 'potential_evapotranspiration')
})
zonal_time_huc01 = system.time({
  zonal_w = weighting_grid(file, 
                           geom, 
                           ID = "ID")
})Next, we benchmark the time it takes to do the following: - run the
intersectr workflow with precomputed weights - run the
exactextractr workflow - run the zonal
workflow - run the zonal workflow with precomputed
weights
huc01_bnch <- bench::mark(
  iterations = 1, check = FALSE, time_unit = 's',
  intersectr_staged_weights = execute_intersection(nc_file = file,
                               variable_name = 'potential_evapotranspiration',
                               intersection_weights = intersectr_input$w,
                               cell_geometry = intersectr_input$grid, 
                               x_var = intersectr_input$x,
                               y_var = intersectr_input$y,
                               t_var = intersectr_input$t, 
                               start_datetime = NULL, 
                               end_datetime = NULL),
  exactextractr            = exactrextract_process(file, geom, "ID"),
  zonal_full               = execute_zonal(file, geom, "ID"),
  zonal_staged_weights     = execute_zonal(file, w = zonal_w)
)
Overall when the polygon count is very high (~20,000), the zonal
aproach with precomputed weights and non precomputed weights performs
the best. Precomputing the weights save a significant amount of memory
in the process and ~4 seconds in total run time. These timings suggest
that 20 years of daily GridMet data could be computed for the ~20,000
catchments in about 11 minutes (~30 seconds * 20 year + ~15 seconds).
The intersectr approach requires way more memory despite
the precomputation of weights and takes about 3 times as long as zonal.
Lastly the exactextract methods timed out at the upper
limit of 180 seconds we prescribed.
Our second example looks at the timings for an aggregation over a large area with a few aggregation units. The gridded data is the same as example 1, and the aggregate units can be seen below:
(florida <- AOI::aoi_get(state = "FL", county = "all"))## Simple feature collection with 67 features and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -87.63 ymin: 24.5 xmax: -80.03 ymax: 31
## Geodetic CRS:  NAD83
## First 10 features:
##     statefp countyfp countyns       affgeoid geoid         name            namelsad stusps
## 65       12      061 00307624 0500000US12061 12061 Indian River Indian River County     FL
## 97       12      037 00306911 0500000US12037 12037     Franklin     Franklin County     FL
## 105      12      115 00295741 0500000US12115 12115     Sarasota     Sarasota County     FL
## 165      12      077 00308549 0500000US12077 12077      Liberty      Liberty County     FL
## 192      12      003 00306920 0500000US12003 12003        Baker        Baker County     FL
## 198      12      083 00306922 0500000US12083 12083       Marion       Marion County     FL
##     state_name lsad      aland     awater state_name.1 state_abbr jurisdiction_type
## 65     Florida   06 1302272805  295509938      Florida         FL             state
## 97     Florida   06 1411498965 2270440522      Florida         FL             state
## 105    Florida   06 1440039085 1086628535      Florida         FL             state
## 165    Florida   06 2164099094   19582444      Florida         FL             state
## 192    Florida   06 1515738972    9686120      Florida         FL             state
## 198    Florida   06 4113993941  192281837      Florida         FL             state
##                           geometry
## 65  MULTIPOLYGON (((-80.88 27.8...
## 97  MULTIPOLYGON (((-85.21 29.7...
## 105 MULTIPOLYGON (((-82.65 27.3...
## 165 MULTIPOLYGON (((-85.12 30.2...
## 192 MULTIPOLYGON (((-82.46 30.5...
## 198 MULTIPOLYGON (((-82.53 29.2...
##  [ reached 'max' / getOption("max.print") -- omitted 4 rows ]
The same functions and timing from example 1 are computed:
int_time_fl = system.time({
  intersectr_input_florida = intersectr_prep(file, florida, ID = "geoid", variable = 'potential_evapotranspiration')
})
zonal_time_fl = system.time({
  zonal_w_florida = weighting_grid(file, florida, ID = "geoid")
})
fl_bnch <- bench::mark(
  iterations = 1, check = FALSE, time_unit = 's',
  intersectr_staged_weights = execute_intersection(nc_file = file,
                               variable_name = 'potential_evapotranspiration',
                               intersection_weights = intersectr_input_florida$w,
                               cell_geometry = intersectr_input_florida$grid, 
                               x_var = intersectr_input_florida$x,
                               y_var = intersectr_input_florida$y,
                               t_var = intersectr_input_florida$t, 
                               start_datetime = NULL, 
                               end_datetime = NULL),
  exactextractr            = exactrextract_process(file, florida, "geoid"),
  zonal_full               = execute_zonal(file, florida, "geoid"),
  zonal_staged_weights     = execute_zonal(file, w = zonal_w_florida)
)