What is a hydrofabric?

The first question generally raised is, “what is a ”hydrofabric”? To date, the term has been a bit nebulous and has been used to describe artifacts as narrow as a set of cartographic lines, all the way to encompass the entire spatial data architecture needed to map and model the flow of water and flood extents.

Here, a “Ngen Hydrofabric” will include the following four parts:

  1. The blue line (flowpaths) and per-flowline catchments feature

  2. The flowline/catchment topologies

  3. Attributes to support routing and runoff modeling

  4. The software and data models to make the creation of these open, reproducible, and flexible.

In this breakdown, features define the computational elements for hydrology and hydraulic modeling; topologies link data together for space/time processing (modular elements that act as a whole); attributes provide the information for model execution (physics based, conceptual and ML/AI model formulations); and software and data models develop community standards, reproducibility, and flexibility to support analysis at scale.

Figure 1

Figure 1

Who cares about a hydrofabric?

Discritizing the land surface into computational elements is fundamental to all modeling tasks. Without it, distributed and lumped models have no way to apply the needed model formulations or computer science applications to achieve meaningful results. Therefore anyone who cares about the science and application of water resource modeling should care about the underlying data as it drives the locations where forecasts are made, the attributes that inform a model, and the spatial elements in which formulation are valid.

However, describing the earths surface - particularly at continental scales - is a tricky task. Automated techniques can get us a long way in representation, however the modeling task at hand and local knowledge should be used in developing an authoritative product. Through time, local knowledge has been collected in a number of places, but never centralized. Further, one off products (like the NHDPlus) have been used to guide all modeling task even in cases when its resolution, or representation is not well suited.

The aim of NOAAs work in this space is to develop a federal reference fabric to support all flavors of modeling, and a national instance of that reference fabric to support heterogeneous model application.

Equally important is the software tools to support flexibility and community uptake; the data models to support interoperability, community engagement, and long term stability; and a reference data set with the quality assurances that when one uses the product they are getting a well vetted resource that will be able to play nicely with the growing Ngen framework.

Can I build my own?

Ngen aims to provide a framework in which heterogeneous models can be used to achieve the best possible results in the highest number of places. In the framework the hydrofabric can is a malleable product that can be modified to support specific applications. However, the starting point for all variations of this product should stem from a consistent, quality reference dataset.

The central role of a reference dataset cannot be overstated as it not only allows people to avoid replication (and possible error) but also provides a source in which all variations of the data can be linked together.

For example - while there might be specific hydrofabric traits needed for runoff modeling, hydraulic routing, and inundation mapping - if all of these groups built their own product from scratch, the ability to link them together to provide actionable information would take a a number of conflation processes that may or may not be easy, let alone feasible.

Instead, if all groups were to start from a common reference system - and track the origins of that system - data conflation is simple meaning time can be spent on science rather then brute force data integration.

As part of NOAA’s efforts to build Ngen, this reference fabric is being built with partners at the USGS, the private sector and the Internet of Water.


In addition to the reference fabric, software that uses this reference to support scale dependent modeling tasks are in development. The foundational layer of the Ngen model instance used as the National Water Model will be based on an authoritative instance of a hydrofabric that will dictate where forecasts are made, where model formulations are run, how the model engine passes data through time and space, and how model as a service visualizes, subsets, and disseminates data.

So, can you build your own? For now all of these products are under heavy development but are not close to final. The anticipated goal for finishing these products in a beta release are fiscal year 2023. When this time comes, you will be able to both generate your own modelling task based data product using the NOAA reference fabric and software tools, and/or utilize subsets of the fabric used for the operational modeling task.

In either case, building or finding a Ngen ready hydrofabric requires the hyAggregate R package that can be installed from Github:

remotes::install_github("NOAA-OWP/hyAggregate")

The goal of hyAggregate is to find and/or develop aggregated hydrologic and hydraulic networks to inform modeling tasks reliant on uniform length (flowpath) and area (catchment) distributions.

Whats to follow:

In this document we illustrate the tool set for working with the established reference fabric to create Ngen ready data products. We focus on VPU 01 (east coast USA), as it is the most heavily tested. The workflow described here will work on any reference fabric artifact once they are produced, validated, and published to ScienceBase.

The following steps walk you through the concepts and tools for building a Ngen ready dataset, what the outputs look like, and how you might interact with them. hyAggregate is part of a larger family of projects and packages aiming to support federal (USGS/NOAA) water modeling efforts. The whole suite of development tools can be installed with:

remotes::install_github("NOAA-OWP/hydrofabric")

Attaching this library, similar to the tidyverse, installs and loads a canon of software designed to manipulate, modify, describe, process, and quantify hydrologic networks and land surface attributes:

library(hydrofabric)

It includes the following:

Repo Purpose
USGS-R/nhdplusTools Tools for for network manipulation
dblodgett-usgs/hyRefactor Tools for network refactoring
NOAA-OWP/hyAggregate Tools for distribution based network aggregation and Ngen file creation
mikejohnson51/opendap.catalog Tools for for accessing remote data resources for parameter and attributes estimation
NOAA-OWP/zonal Tools for rapid catchment parameter summarization

A National Reference Fabric

hyAggregate relies on data products within the Geospatial Fabric for National Hydrologic Modeling, version 2.0 project, the general outline of which, can be seen below:

Figure 2

Figure 2

In the first row of Figure 2, there are three (3) reference products.

  1. An updated network attributes table that provides attributes for the network features in the data model of NHDPlusV2, but with substantial improvements based on contributions from the USGS, NOAA OWP, NCAR and others.

  2. A set of reference catchment geometries in which geometric errors and artifacts in the NHDPlus CatchmentSP layer are corrected/removed.

  3. A set of reference flowline geometries where headwater flowlines have been replaced with the NHDPlus Burn lines

The CONUS reference files for these datasets can be downloaded here respectively (attributes, catchments, flowpaths)

A National Refactored Fabric

In the second row of Figure 2, the reference products are refactored based on a minimum flowpath criterion. This process is facilitated by the hyRefactor2 R package. The concept of refactoring includes (1) Splitting large or long catchments in the reference data to create a more uniform catchment size distribution and (2) collapsing catchment topology by removing very small inter and intra confluence segments, and to merge very small headwaters. The goal of refactoring is to NOT reduce the fidelity of the network, but instead to move the network to a more uniform/coherent version of the network.

As with all function from here on out, refactoring is a parameter-based workflow and the selection parameters will impact the resulting network. For the National refactored fabric, the following were selected:

Parameter Purpose Elected Value
split_flines_meters the maximum length flowpath desired in the output. 10,000
collapse_flines_meters the minimum length of inter-confluence flowpath desired in the output. 1,000
collapse_flines_main_meters the minimum length of between-confluence flowpaths. 1,000

The refactored output is shared under the Refactored Parent item of the above ScienceBase resource (available here)



NOTE:

There is a difference between the use of flowline and flowpath. Per the hy_features standard, a catchment can have one primary flowpath (1:1) but multiple flowlines (1:many).

There is also a difference in the concept of a catchment, which is the holistic unit of hydrology, and a catchment divide which is the edge bounded by an inflown and outflow node

For the development of the reference fabric and modeling task outputs, we seek to define a set of divides and corresponding flowpaths from the reference flowlines and catchments.



Getting Reference & Refactored Data

All reference, refactored and Ngen hydrofabric archives live on ScienceBase. They can be accessed with the web interface or can be downloaded programatically.

The hyAggregate::get_reference_fabric() utility will download the most current geofabric for a Vector Processing Unit (VPU). Options include downloading the “refactored” (default) or “reference” data. If the requested file already exists the file path will be returned. Here we can find the local path to the reference fabric for VPU=01 in the ./data directory, and explore the layers contained within:

VPU = "01"
ref_file = get_reference_fabric(VPU = VPU, type = "reference", dir = "data")
st_layers(ref_file)
#> Driver: GPKG 
#> Available layers:
#>            layer_name geometry_type features fields
#> 1  reference_flowline   Line String    65080     31
#> 2 reference_catchment       Polygon    65968      6
#> 3                  WB Multi Polygon       85     21
#> 4                POIs         Point     3185     12

Within the reference fabric artifacts - the reference_flowline and reference_catchment layers are those associated with the reference data (row 1 in figure 2). (NOTE: In some artifacts, these are called nhd_flowlines and nhd_catchments per this issue)

We can also request the data products associated with the national refactor:

file = get_reference_fabric(VPU = VPU, type = "refactored", dir = "data")
st_layers(file)
#> Driver: GPKG 
#> Available layers:
#>             layer_name geometry_type features fields
#> 1 refactored_flowpaths   Line String    40195      7
#> 2   refactored_divides       Polygon    40195      4
#> 3          mapped_POIs         Point     3372     14
#> 4         lookup_table            NA    63717      4

Within the refactored fabric artifacts - the refactored_flowpaths and refactored_divides layers are the output of the refactoring process (row 2 in figure 2). The remaining layers are those central to the USGS gfv2.0 modeling task (row 3 in figure 2). (NOTE: In some artifacts, these are called reconciled and divides per this issue)

NOAA NextGen Modeling Task

The NOAA Next Generation Water Resource Modeling Framework (Ngen) is a specific modeling task. hyAggregate houses the workflow(s) for generating the needed output for the NOAA NextGen modeling task, that starts from the refactored_flowpaths and refactored_divides layers.

It operates under the assumption that there is no “one model to rule them all” and that different model formulations (e.g. topmodel, wrf-hydro, LSTM) will work better in different locations. Formulations are then modules that can execute at the scale of the catchment. The aim of the hydrofabric is to encapsulate the higher level notion of a catchment (see the HY_features specification) such that a variety of hydraulic and hydrologic models can execute and exchange information.

Because of the notion is to model the runoff process in a heterogeneous way, there is an interest in trying to align the scale of the catchment artifacts to the scale of the hydrologic processes being simulated. The scale identified for this initial run is the 3 - 15 square kilometers, with an ideal size of 10 sqkm.

As seen below, this is not even close to the area/length distributions found in either the reference or refactored artifacts.

Process

To overcome the mismatched scale in the reference/reference fabric and the Ngen modelling task, we need to aggregate the catchments to a defined user threshold/distribution. Therefore, Ngen is a distribution based modeling task that seeks to align the scale of process to scale of catchment representation.

hyAggregate allows users to prescribe an ideal catchment size, a minimum catchment area, and a minimum flowpath length. The default parameters used for Ngen are an ideal catchment size of 10 km², a minimum catchment size of 3 km², and a minimum flowpath length of 1 km. A refactored_fabric can be aggregated to this distribution using hyAggregate:

ngen_v1 =  aggregate_network_to_distribution(gf = get_reference_fabric(VPU = VPU, dir = "data"),
                             outfile = glue("data/01_ngen_{VPU}.gpkg"), 
                             nexus_topology = FALSE) 
#> Warning: data/01_ngen_01.gpkg already exists and overwrite is FALSE

st_layers(ngen_v1)
#> Driver: GPKG 
#> Available layers:
#>            layer_name geometry_type features fields
#> 1 aggregate_flowpaths                  17739     10
#> 2   aggregate_divides                  17739      2

You’ll see that aggregating the network to this scale requires losing resolution of the network. In its current iteration Ngen requires a 1:1 flowpath to catchment relationship. In time the ability to support 1:many relationship will be developed so that the flowpath and/or catchment network can be densified in areas where it is needed. Until then, the 1:1 requirement mean that for each catchment a primary mainstem must be elected, and it is this mainstem only that is reflected in the final aggregated product.

The outputs of the base hyAggregate function create a geopackage with a set of catchment and flowpath features. Below we see the desired distribution has been enforced on the resulting hydrofabric. A more complete discussion of what happens within the aggregation can be found in the code and will be documented latter on.

Below is a spatial view of this process. The gray catchments with white outlines are the “refactored_divides” while the yellow lines are the “refactored_flowpaths”. The red, hollow catchments are the “aggregated_divides” while the blue lines are the “aggregated_flowpaths”. This image hopefully illustrates the level of manipulation that occurs in network as a result of the requested parameters. When viewing this image the white visible edges are dissolved and the yellow flowpaths are dropped. In cases where catchments are dissolved, the associated flowpaths and topologies are modified to reflect the changes.

Distrubution and mapping

All core hydrofabric data is distributed as a geopackage segmented by vector processing unit. Geopackages are spatial databases that can be used with many software suites, as well as programming languages that call GDAL (e.g. R, python, Rust). Once read in, the data layers can be mapped with base products, packages (In R, sf, ggplot2, leaflet, mapview – In python fiona, geopandas, geoplot, matlibplot, etc), and GUI’s (QGIS).

In R, here is are a few basic examples:

Base Plot

agg = read_sf(ngen_v1, "aggregate_flowpaths") 
plot(agg$geom)

Crop and plot

#Define area and project to same Coordinate Reference System
AOI = AOI::aoi_get("Boston") |> 
  st_transform(st_crs(agg))

boston_flow = st_intersection(agg, AOI)
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries

plot(boston_flow$geom)
plot(AOI$geometry, add = TRUE)

Interactive Plot

mapview::mapview(boston_flow)

In QGIS, double clicking the gpkg file will allow you to select which layers to load.

Topology

The flowpath/divides produced in the aggregation provide the features that discritize the landscape and river network into the computational elements that will be used. The next construct is that we need a description of how these features are connected and which

To date, the NHDPlus topology (and many other flow network models) rely on flowpath-to-flowpath connectivity that describes the network in terms of how flowpaths connect to other flowpaths (therefore, which divides connect to other divides). The reference and refactored products retain this ‘flowline-to-flowline’ connectivity as does the ngen_aggregation when nexus_topology=FALSE.

topology = read_sf(ngen_v1, 'aggregate_flowpaths') |> 
  st_drop_geometry() |> 
  select(id, toid)

head(topology)
#> # A tibble: 6 × 2
#>      id  toid
#>   <int> <dbl>
#> 1 25946 25944
#> 2 25944 25939
#> 3 25943 25942
#> 4 25942 25939
#> 5 25940 25939
#> 6 25941 25939