Impact of Dams on Streamflow - R
This working example is part 4 of the Internet of Water Workshop on Geospatial data and web services in R. The first three sections can be found here. # Introduction
This working example is part 4 of the Internet of Water Workshop on Geospatial data and web services in R. The first three sections can be found here.
The primary goal of this hands-on tutorial is to introduce a handful of geospatial web services for conducting scientific studies. For this purpose, we’re going to take a look at the effects of building dams on stream flow. Here are some example peer-reviewed papers on this topic.
We set the area of interest (AOI) for this study to Texas and study dams that have been built in the 1995-2005 period.
# Define AOI
texas = aoi_get(state = "TX")
# Instantiate plot
ggplot() +
# Add Texas POLYGON
geom_sf(data = texas, color = "red", fill = "blue") +
# Apply theme
theme_light()
We use the National Water Information System (NWIS) service to check stream flow availability in Texas, for the period 10 years before and after the 1985-2015 period of interest.
# Query NWIS site data
nwis = whatNWISdata(
# State to search
stateCd = "TX",
# Date range
startDt = "1985-01-01", endDt = "2015-01-01",
# Daily Values
outputDataTypeCd = "dv",
# Stream flow
parameterCd = "00060",
# Mean statistic
statCd = '00003')
glimpse(nwis)
#> Rows: 650
#> Columns: 24
#> $ agency_cd <chr> "USGS", "USGS", "USGS", "USGS", "USGS", "…
#> $ site_no <chr> "07227420", "07227500", "07227890", "0722…
#> $ station_nm <chr> "Cramer Ck at US Hwy 54 nr Dalhart, TX", …
#> $ site_tp_cd <chr> "ST", "ST", "ST", "ST", "ST", "ST", "ST",…
#> $ dec_lat_va <dbl> 35.75125, 35.47033, 35.72078, 35.66476, 3…
#> $ dec_long_va <dbl> -102.89317, -101.87963, -101.66253, -101.…
#> $ coord_acy_cd <chr> "S", "F", "1", "F", "F", "F", "F", "F", "…
#> $ dec_coord_datum_cd <chr> "NAD83", "NAD83", "NAD83", "NAD83", "NAD8…
#> $ alt_va <dbl> 3962.24, 2989.16, 2933.19, 2835.84, 2301.…
#> $ alt_acy_va <dbl> 0.01, 0.10, 0.10, 0.10, 0.10, 0.10, 0.10,…
#> $ alt_datum_cd <chr> "NGVD29", "NGVD29", "NAVD88", "NGVD29", "…
#> $ huc_cd <chr> "11090102", "11090105", "11090105", "1109…
#> $ data_type_cd <chr> "dv", "dv", "dv", "dv", "dv", "dv", "dv",…
#> $ parm_cd <chr> "00060", "00060", "00060", "00060", "0006…
#> $ stat_cd <chr> "00003", "00003", "00003", "00003", "0000…
#> $ ts_id <chr> "131754", "131763", "131775", "131784", "…
#> $ loc_web_ds <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ medium_grp_cd <chr> "wat", "wat", "wat", "wat", "wat", "wat",…
#> $ parm_grp_cd <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ srs_id <chr> "1645423", "1645423", "1645423", "1645423…
#> $ access_cd <chr> "0", "0", "0", "0", "0", "0", "0", "0", "…
#> $ begin_date <date> 2007-10-01, 1938-04-01, 2010-02-26, 1974…
#> $ end_date <date> 2020-12-15, 2022-07-18, 2022-07-18, 1989…
#> $ count_nu <dbl> 4825, 30784, 4522, 5692, 30790, 20912, 22…
We can see that there are 650 stream flow gauges in Texas that fit our criteria. Now, let’s filter these stations a step further to include only those stations that have around 30 years of daily stream flow data with drainage area of larger than 10 km2 that have been impacted by human activities.
Since upstream drainage area is not part of the returned NWIS values, we need to join our gauge data to another dataset that contains this information - for example the GAGESII dataset!
The nhdplusTools packages provides access to subset the gagesII data for our AOI. We can then further refine the outputs by HDCN class, sqkm, and record count.
sites =
# Read data for AOI
get_gagesII(AOI = texas) %>%
# Find those that the reference classification is NA and the drainage area is greater then 10
filter(hcdn_2009 == "", drain_sqkm > 10) %>%
# Join to the site IDs returned from NWIS
inner_join(nwis, by = c("staid" = "site_no")) %>%
# Find those that have at least 30 years of data
filter(count_nu > 30*365) |>
# Send to a projected CRS
st_transform(5070)
We can see that there are 280 stream flow gauge stations in Texas that fit our criteria.
# Instantiate plot
ggplot() +
# Add Texas POLYGON
geom_sf(data = texas, color = "red", fill = "blue") +
# Add sites to map
geom_sf(data = sites, pch = 8, color = "white") +
# Apply theme
theme_light()
Next, we need to retrieve the dams in Texas that have been built between 1995 and 2005. For this exercise, their is not a function based web service for getting the latest (2019) National Inventory of Dams (NID) data from the US Army Corps. However, the USACE exposes the data as a CSV file online meaning we can read it directly into our work session:
dams =
# Read data from URL
read.csv('https://nid.usace.army.mil/api/nation/csv', skip = 1) %>%
# Find those within year range, the state of Texas, and of a given size
filter(between(Year.Completed, 1995, 2005), State == "Texas") %>%
# convert table to sf object
st_as_sf(coords = c('Longitude', 'Latitude'), crs = 4326) %>%
# Keep only relevant columns and rename
select(name = Dam.Name,
nidid = NID.ID,
storage = NID.Storage..Acre.Ft.,
height = Dam.Height..Ft.,
completed = Year.Completed) %>%
# Send to a projected transformation
st_transform(5070)
# Instantiate plot
ggplot() +
# Add Texas POLYGON
geom_sf(data = texas, color = "red", fill = "blue") +
# Add sites to map
geom_sf(data = sites, pch = 8, color = "white") +
# Add dams to map
geom_sf(data = dams, pch = 16, color = "darkgreen") +
# Apply theme
theme_light()
As is evident from the plot above, there are many gauge locations that don’t have any dams in their vicinity. One way to eliminate these stations is using a spatial query based on a search radius. We can determine an estimation for our search radius based on the upstream drainage area distribution of the stream flow gauges.
ggplot(data = sites, aes(x = drain_sqkm)) +
# Generate a histogram
geom_histogram() +
# Add vertical (v) line
geom_vline(xintercept = 10000, color = "red") +
# Add labels
labs(title = "Drainage Area (km2) Histogram")
We see that most stations have a drainage area of less than 10,000 km2. Now, we define a function that carries out an efficient spatial mapping to find the stations and dams that have at least one (other) feature within a 10-km radius.
distance_filter = function(dams, gages, dist_m){
# calculate the pairwise distance between gages and dams
# Find boolean condition if distance is greater then dist_m
# Extract the row/col indexes
map = which(st_distance(dams, gages) < units::as_units(dist_m, "m"), arr.ind = TRUE)
# use the index values to extract IDs
map = data.frame(NID = dams$nidid[map[,1]],
NWIS = gages$staid[map[,2]])
# use IDs to filter dams and gages
return(list(dams = filter(dams, nidid %in% map$NID),
sites = filter(gages, staid %in% map$NWIS)))
}
# Execute function to get a list of features within proximety of 10000m
proxy = distance_filter(dams, gages = sites, dist_m = 10000)
# How many of each feature type?
lengths(proxy)
#> dams sites
#> 6 39
# Instantiate plot
ggplot() +
# Add Texas POLYGON
geom_sf(data = texas, color = "red", fill = "blue") +
# Add sites to map
geom_sf(data = proxy$sites, pch = 8, color = "white") +
# Add dams to map
geom_sf(data = proxy$dams, pch = 16, color = "darkgreen") +
# Apply theme
theme_light()
So far, we obtained only the stations that have at least one dam in their 10-km radius, but we didn’t find if those dams are in their upstream or downstream of the associated gauge. We use the Hydro Network-Linked Data Index (NLDI) web service to obtain the upstream flowlines of the stream flow gauges up to 10 km. Note that this 10 km is the distance along the flowlines.
crosswalk = list()
proxy$dams$dam_comid = NA
proxy$sites$usptreamDam = NA
# Discover NHDPlus catchment comid of each dam
for(i in 1:nrow(proxy$dams)){
proxy$dams$dam_comid[i] = findNLDI(location = st_transform(proxy$dams[i,], 4326))$comid
}
#
for(i in 1:nrow(proxy$sites)){
# Find upstream comids of each gauge within 10km
UT = findNLDI(nwis = proxy$site$staid[i],
nav = "UT",
distance_km = 10)$UT_flowlines$nhdplus_comid
# Mark upstream (us) dams by COMID matching
us = proxy$dams$nidid[which(proxy$dams$dam_comid %in% UT)]
# If no dams upstream report NA, else report dam ID
us = if(length(us) > 0){
us
} else {
NA
}
# Save index of nwis to nid relationships
crosswalk[[i]] = data.frame(staid = proxy$site$staid[i], nidid = us)
}
# Merge crosswalk with dams
crosswalk = inner_join(bind_rows(crosswalk), proxy$dams, by = "nidid")
# Merge crosswalk with gages
crosswalk = inner_join(crosswalk, proxy$sites, by = "staid")
# Reduce crosswalk
crosswalk = select(crosswalk, c("staid", dam_comid, nidid, storage, height, name, completed, begin_date, end_date))
# View
datatable(crosswalk)
Upon finalizing the stations that satisfy our criteria, we use NWIS to obtain the daily stream flow data.
We can use this data to make a series of plots showing the annual mean stream flow values overlaid with the years a dam was constructed. We can do this iteritively for each identified gauge:
# Empty list to add plots to...
p = list()
# For each unique site ID
for(i in unique(crosswalk$staid)){
# year of completion minus 1
years_of_construction = filter(crosswalk, staid == i)$completed - 1
# Find the 15 year range
y = range(years_of_construction - 15, years_of_construction + 15)
# Extact complete NWIS record for streamflow at each site
flows = readNWISdv(site = i, parameterCd = "00060") |>
# Rename columns
renameNWISColumns() |>
# Strip the Year from each data and group by that year
group_by(year = as.numeric(format(Date, '%Y'))) |>
# Determine the mean flow in each year and convert to CMS
summarise(meanQ = mean(Flow, na.rm = TRUE) * 0.028316846592,
# Count number of observations in each year
n = n()) |>
# Only keep data for years within our range and that have 300+ records
filter(between(year, min(y), max(y)), n > 300) |>
# Join to all years (add NAs for years not observered or cut)
right_join(data.frame(year = min(y):max(y)), by = "year")
# make plot and add to p list
p[[length(p) + 1]] =
ggplot(data = flows) +
geom_line(aes(x = year, y = meanQ )) +
geom_vline(xintercept = years_of_construction, color = "red") +
labs(title = i, x = "Year", y = "Flow (cfs)")
}
# Combine and render plots:
do.call("grid.arrange", c(p, ncol = 2))
We can see that based on the available data and visual inspection, the first station shows a noticeable difference before and after the dam construction. Next we take a closer look at this station, USGS-07312200 to find if there is a visible reason for why these dams were added in the land cover.
To do this we identify the up and downstream networks (dataRetrival), and find the associated catchments (nhdplusTools)
# Use NLDI to find up and down stream flowline of the dam COMID within 30 km
region = findNLDI(comid = 13730353, nav = c("UT", "DD"), distance_km = 30)
# Used the returned data to extract catchment realizations of the NHDPlus
catchments = get_nhdplus(comid = c(region$DD_flowlines$nhdplus_comid,
region$UT_flowlines$nhdplus_comid),
realization = "catchment")
We can then use dap
(opendap.catalog) to subset a cloud
native TIF file with vsicurl
, and plot the returned
data:
# Subset cloud-native TIF file of NLCD
lc = dap("/vsicurl/https://storage.googleapis.com/feddata-r/nlcd/2019_Land_Cover_L48.tif",
AOI = catchments)
# Transform catchments to the projection of the NLCD
catchments_proj = st_transform(catchments, crs(lc))
# Mask land cover data with the catchments
masked_data = mask(lc, vect(catchments_proj))
# Plot!
{
plot(masked_data)
plot(dams$geometry, add = TRUE, pch = 25, bg = "red")
plot(catchments_proj$geometry, add = TRUE)
}
Last, we might be interested in the percent of land cover up and down stream of the dam(s). We can find these with the data already queried:
# Downstream catchments
ds_cat = filter(catchments_proj,
featureid %in% region$DD_flowlines$nhdplus_comid)
# Mask Land cover to Downstream catchments, and compute land cover frequency
ds_frac = freq(mask(lc, vect(ds_cat))) %>%
arrange(-count) %>%
select(value, count) %>%
mutate(DD_sqkm = count * prod(res(lc)) / 1e6,
DD_percentage = round(DD_sqkm / sum(DD_sqkm), 2),
DD_sqkm = NULL, count = NULL)
# Upstream catchments
us_cat = filter(catchments_proj,
featureid %in% region$UT_flowlines$nhdplus_comid)
# Mask Land cover to Upstream catchments, and compute land cover frequency
us_frac = freq(mask(lc, vect(us_cat))) %>%
arrange(-count) %>%
select(value, count) %>%
mutate(UT_sqkm = count * prod(res(lc)) / 1e6,
UT_percentage = round(UT_sqkm / sum(UT_sqkm), 2),
UT_sqkm = NULL, count = NULL,)
# View:
full_join(ds_frac, us_frac) %>%
mutate(`delta(DD - UT)` = DD_percentage - UT_percentage) %>%
filter(`delta(DD - UT)` != 0) %>%
datatable() %>%
formatStyle(
'delta(DD - UT)',
color = styleInterval(0, c('red', 'green'))
)
Notably we see that the upstream area has significantly more grassland then downstream, while the downstream catchments have significantly more cropland. The implications is that these dams were constructed to support the transition of grassland to agricultural land.
While a small example, this workflow shows how the details learned today can be pieced together to answer complex and meaningful questions in a programmatic, reproducible, and lightweight way.