On September 26, 2016 at 11:47 a.m. U.S. Central Daylight Time (16:47 UTC) the Cedar and Wapsipinicon rivers in Iowa surged producing a flood wave that breached the river banks. The water level of the Cedar River measured ~20 feet — 8 feet above flood stage—near the city of Cedar Rapids.
The water level continued to rise until it peaked at ~22 feet on September 27. This event had only been exceeded once, in June 2008, when thousands of people were encouraged to evacuate from Cedar Rapids, the second-most-populous city in Iowa.
In this lab we are interested in the impacts in Palo Iowa because it is up stream of Cedar Rapids, contains a large amount of farm land, and does not have a forecast location to provide warning.
We will use the raster
package and our understanding of raster data and categorization to create flood images using mutliband Landsat Imagery, thresholding, and classification methods.
library(raster) # Raster Data handling
library(tidyverse) # Data Manipulation
library(getlandsat) # keyless Landsat data (2013-2017)
library(sf) # Vector data processing
library(mapview) # Rapid Interactive visualization
Almost all remote sensing / image analysis begins with the same basic steps:
Identifying an area of interest (AOI)
Identifying and downloading the relevant images or products
Analyzing the raster products
First we need to identify an AOI. We want to be able to extract the flood extents for Palo, Iowa and its surroundings
Using your uscities.csv
, filter to Palo, Iowa and create a 5 kilometer buffer.
To do this:
From the Palo, Iowa feature:
generate a 5 kilometer (5,000 meter) buffer around the point using st_buffer
find the bounding box of the buffered region using st_bbox
Convert that bbox object into as sfc
then sf
object
This region defines the AOI for this analysis.
The work for this question will be done in an Rscript! The first step takes a while and only needs to be done once!
The second part of any raster analysis is identifying the imagery/products you will use. For our analysis we will be using data from Landsat 8. Landsat 8 is the newest generation of the Landsat satellites and provides a useful resource for water detection. The OLI sensor aboard Landsat 8 has nine bands for capturing the spectral response of the earth’s surface at discrete wavelengths along the electromagnetic spectrum.
Amazon hosts all Landsat 8 data in a bucket associated with the OpenData Registry Initiative
The getlandsat
R package provides a nice interface to this product for images taken between 2013-2017. Current efforts to extend this resource are underway using the SAT-API.
To find our images, we first need to list all available scenes, filter to those that meet our criteria (date and bounding box), and isolate the scenes unique identifier.
To do this:
getlandsat::lsat_scenes()
and assign it to an object
getlandsat::lsat_scenes()
will download and extract a CSV of ALL scenes archived on the AWS bucket between (2013-2017) (n = 2,070,448)st_bbox
)xmin
, xmax
, ymin
, ymax
valuesacquisitionDate
comes as a POSIXct
, these include hour:minutes:seconds. Cast it to a date object with as.Date
, and check it against (as.Date("2016-09-26")
)write.csv
:write.csv(YOUR_FILTERED_DATA, file = "data/palo-flood-scene.csv")
The work for this step will be done in your Rmd!
Now that you identified the meta data for you ideal image set, we need to download and cache the data on our computers.
Caching essentially means we will download the data to a standardized location known to the downloading utility.
Before any data is downloaded, the utility will check if it already exisits. If it does, then the path to the file is returned, if it does not, the data is downloaded.
The getlandsat
package provides a nice caching system. To download and cache image data we need to do the following:
read_csv
)lsat_scene_files
We can do this using grepl
like last week! However we will expand on pattern matching techniques using multiple patterns and a constraint:
|
paste
and the collapse
argument
|
separator:paste0('B', 1:6, ".TIFF", collapse = "|")
## [1] "B1.TIFF|B2.TIFF|B3.TIFF|B4.TIFF|B5.TIFF|B6.TIFF"
When we look at the possible files for each scene we see that for each band, there is a .TIF file and a .TIF.ovr files
An .ovr
file is used for storing the pyramid layers for a raster dataset and exists when the pyramids are built with ArcGIS 10 or later.
Since we are not using ArcGIS, we do not want the .ovr file.
So we can constrain our pattern search using the $
sign
In regular expression matching (like grep
/grepl
) the $
only returns the string if the pattern preceding the $
occurs at the end of each line
This pattern will be useful to you when you want to search for files with a specific extension latter on…
paste0('B', 1:6, ".TIFF$", collapse = "|")
## [1] "B1.TIFF$|B2.TIFF$|B3.TIFF$|B4.TIFF$|B5.TIFF$|B6.TIFF$"
Use the above pattern to filter the potential files retrieved with lsat_scene_files
.
Once filtered, arrange the file names (so that band 1 is first and band 6 is last) and pull off the file variable.
The result should be a vector of file paths to download/cache.
Great! Now that we have the URLs of the files we want, we need to download them to our cache location, again we will use the built in caching features of getlandsat.
lsat_image
will take a file name - like those your created in step 2 - and download the file to your cache. If the download has already occurred, then the path to the cached image is returned without re downloading the data.
Right now we have 6 files we want, 1 for each band. So we want to apply lsat_image
, over this vector of files to return a set of local file paths. For this we can use the apply
family in base R.
lapply
returns a list
of elements resulting from a specified function (FUN
) applied to all inputs.
sapply
is a wrapper of lapply
that returns a vector, matrix or, if simplify = “array”, an array rather then a list
vapply
is similar to sapply
, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.
We want to apply the lsat_image
function over our URL paths to return a vector of local file paths … so we can use sapply
where the files
are the input and the FUN = lsat_image
st = sapply(files, lsat_image)
With a set of local file paths, you can create a raster stack! Remember your files need to be in the order of the bands (double check step 2).
b = stack(st)
Once the stack is made set the names to band1, band2, band3, etc. (Hint: use paste0(band, 1:6)
)
What are the dimensions of your stacked image? What is the CRS? What is the cell resolution?
We only want to analyze our image for the regions surrounding Palo (our AOI). Transform your AOI to the CRS of the landsat stack and use it to crop your raster stack.
What are the dimensions of your cropped image stack? What is the CRS? What is the cell resolution?
Awesome! We have now (1) identified, (2) downloaded, and (3) cached our images.
We have loaded them as a multiband raster object in R and cropped the domain to our AOI. Lets make a few RGB plots to see what these images reveal.
115 A-C covers the notion of spectral signatures (bands) and spectral combinations in greater detail as well as common task like atmospheric correction. We dont need to worry much about correction here since we are using a LT1 product (check the processingLevel
in the metadata)
Standard cameras replicate whats seen with the human eye, by capturing light in the red, green and blue wavelengths and applying red, green ,and blue filters (channels) to generate a natural looking RGB image.
With a multispectral Landsat 8 image, we have more information to work with and different wavelengths/combinations can help isolate particular features.
For example, the Near Infrared (NIR) wavelength is commonly used to analysis vegetation health because vegetation reflects strongly in this portion of the electromagnetic spectrum. Alternatively, the Shortwave Infrared (SWIR) bands are useful for discerning what is wet and dry.
When working with Landsat imagery, a logical first step is to load an image into an image analysis program (like ENVI) to visualize whats in the scene. We can do the same thing with R using the plotRGB
function and selecting which band should populate each channel.
Below is a summary of the Landsat 8 bands, names, wavelengths, and resolutions captured by each sensor.
Go ahead and rename your raster stack with the names of the band (e.g. “coastal”, “red”, “green”, …)
There are many online examples (and classes like 115) that discuss the applications of the available bands in Landsat. What we want to do here is simply show how loading different combinations into the RGB channels make different features stand out.
You will make four unique combinations:
A useful reference of popular band combinations is here.
Replicate the images produced above by applying a stretch to improve clarity and/or contrast. Try both stretch="lin"
and stretch="hist"
to find the one you like best.
Describe the purpose of applying a color stretch.
Accurate assessment of surface water features (like flooding) have been made possible by remote sensing technology. Index methods are commonly used for surface water estimation using a threshold value.
For this lab we will look at 5 unique thresholding methods for delineating surface water features from different combinations of Landsat bands listed below:
Name | Abbreviation | Formula | Water Threshold |
---|---|---|---|
Normalized difference vegetation index | NDVI | \[ \frac{NIR - Red}{NIR + Red} \] | Cells less then 0 |
Normalized Difference Water Index | NDWI | \[ \frac{Green - NIR}{Green + NIR} \] | Cells greater then 0 |
modified | |||
normalized difference water index | MNDWI | \[ \frac{Green - SWIR1}{Green + SWIR1} \] | Cells greater then 0 |
water ratio index | WRI | \[ \frac{Green + Red}{NIR + SWIR1} \] | Cells greater then 1 |
simple water index | SWI | \[ \frac{1}{\sqrt{Blue - SWIR1}} \] | Cells less then 5 |
colorRampPalette(c("blue", "white", "red"))(256)
)Describe the 5 images. How are they simular and where do they deviate?
Here we will extract the flood extents from each of the above rasters using the thresholds defined in the above table.
For this, we will use the calc
function and apply a custom formula for each calculated field from step 1 that applies the threshold in a way that flooded cells are 1 and non-flooded cells are 0.
Once completed, stack the binary ([0,1]) files into a new stack, set the names to meaningful descriptions, and plot the stack so that floods are blue, and background is white.
Perform one more classifier making sure that all NA values are set to zero.
An alternative way to identify similar features in a continuous field is through supervised or unsupervised classification. Supervised classification groups values (cells) based on user supplied “truth” locations. Since flood events are fast-occurring there is rarely truth points for a live event. Instead developers rely on libraries of flood spectral signatures.
Unsupervised classification finds statistically significant groupings within the data. In these clustering algorithms, the user specifies the number of classes and the categorization is created based on the patterns in the data.
For this lab we will use a simple k-means algorithm to group raster cells with similar spectral properties.
Anytime we want to be able to produce a consistent/reproducible result from a random process in R we need to set a seed. Do so using set.seed
getValues
dim
What do the diminsions of the extracted values tell you about how the data was extracted?
Remove NA values from your extracted data with na.omit
for safety
Use the kmeans clustering algorithm from the stats
package to cluster the extracted raster data to a specified number of clusters k
(centers). Start with 12.
Once the kmeans
algorithm runs, the output will be a list of components. One of these is cluster
which provides a vector of integers from (1:k
) indicating the cluster to which each cell was allocated.
Create a new raster object by copying one of the original bands. For example:
kmeans_raster = s2$ndvi
Try a few different clusters (k) to see how the map changes.
Great! You now have a categorical raster with categories 1:k. The issue is we don’t know the value that corresponds to the flood water.
To identify the flood category programatically, generate a table
crossing the values of one of your binary flood rasters, with the values of you kmeans_raster.
To do this, you will use the table
function and pass it the values from a binary flood raster, and the values from your kmeans_raster
. Here the following occurs:
table
builds a contingency table counting the number of times each combination of factor levels in the input vector(s) occurs. This will give us a table quantifying how many cells with a value 1 are aligned with each of the k classes, and how many cells with a value 0 are aligned with each of the k classes.
If you pass the binary flood values as the first argument to table
then the unique values (0,1) will be the rows. They will always be sorted meaning you know the flooded cells will be in the second row.
which.max()
returns the index of the maximum value in a vector.
combine this information to identify the cluster in the kmeans data that coincides with the most flooded cells in the binary mask.
Once you know this value, use calc
to extract the flood mask in a similar way to the thresholding you did above.
Finally use addLayer
to add this new layer to your flood raster stack.
Awesome! You have now created a flood raster using 6 different methods. Our last goal is to identify how they compare.
cellStats
to determine the sum of each layer. Since flooded cells have a value of 1, the sum of an entire band is equivalent to the number of flooded cells. You can then use the resolution of the cell to convert counts to a flooded area.Print these values as a kable table
calc
. The higher the count in each pixel, the more certain we can be about its flooded state. For example, if a cell has a value of 6, it indicates that every method identified the cell as flooded, if it has a value of 2 then we know that two of the methods identified the cell as flooded.Plot your flood map using the blues9 color palette
mapview
. Zoom and pan around the interactive map noting that a pixel level is displayed in the upper right hand corner.Why are some of the cell values not an even number?
Congratulations! You have successfully carried out a complete flood analysis from data acquisition through evaluation. This kind of work goes on regularly and is part of a couple national efforts (NOAA, USGS, FirstStreet, FEMA) to generate flood inundation libraries that contribute to better extraction and classification of realtime flood events, resource allocation during events, and damage assessments post events.
Here we used landsat imagery but the same process could be implemented on drone footage, MODIS data, or other private satellite iamgary.
Your evaluation was based purely on the raster data structure and your ability to conceptualize rasters as vectors of data with dimensional structure. You applied simple mathematical operators (+, /, -) to the raster bands, and a kmeans clustering algorithm to the data matrix of the multiband raster - all within ~100 lines of code!
Check out this drone footage taken during the flooding in Palo, Iowa. In particular we see a bit of residential flooding at minute 2:17 (screen shot below)
Our goal is to see if our flood classifications were able to capture this event.
Use mapview
to generate a slippy map of the Palo, Iowa bbox.
Find the location shown in the above image using context clues and different base maps. Once you do, do the following:
Create a sfc
object from the latitude and longitude of the mouse coordinates at the impacted location
use the st_point
constructor to create an sfg
; convert it to an sfc
with the appropriate lat/lng CRS; and the transform
to the CRS of the flood rasters
Use raster::extract
to extract the binary flood values at that location from the six layer flood map stack
How many of the maps captured the flooding at that location?
Total: 150 points (170 points total)
You will submit a URL to your web page deployed with GitHub pages.
To do this:
https://USERNAME.github.io/geog-13-labs/lab-05.html
Submit this URL in the appropriate Gauchospace dropbox. Also take a moment to update your personal webpage with this link and some bullet points of what you learned. While not graded as part of this lab, it will be your final!