Background

On September 26, 2016 at 11:47 a.m. U.S. Central Daylight Time (16:47 UTC) the Cedar and Wapsipinicon rivers in Iowa surged producing a flood wave that breached the river banks. The water level of the Cedar River measured ~20 feet — 8 feet above flood stage—near the city of Cedar Rapids.

The water level continued to rise until it peaked at ~22 feet on September 27. This event had only been exceeded once, in June 2008, when thousands of people were encouraged to evacuate from Cedar Rapids, the second-most-populous city in Iowa.

In this lab we are interested in the impacts in Palo Iowa because it is up stream of Cedar Rapids, contains a large amount of farm land, and does not have a forecast location to provide warning.

We will use the raster package and our understanding of raster data and categorization to create flood images using mutliband Landsat Imagery, thresholding, and classification methods.

Libraries

library(raster) # Raster Data handling
library(tidyverse) # Data Manipulation
library(getlandsat) # keyless Landsat data (2013-2017)
library(sf) # Vector data processing
library(mapview) # Rapid Interactive visualization

Almost all remote sensing / image analysis begins with the same basic steps:

  1. Identifying an area of interest (AOI)

  2. Identifying and downloading the relevant images or products

  3. Analyzing the raster products

Question 1

First we need to identify an AOI. We want to be able to extract the flood extents for Palo, Iowa and its surroundings

Using your uscities.csv, filter to Palo, Iowa and create a 5 kilometer buffer.

To do this:

  • Read in the csv data
  • Structure it as a spatial (sf) feature by defining the coordinate fields and CRS
  • Filter to only include Palo, Iowa
  • Transform the projection into a CRS appropriate for meter-based measurements

From the Palo, Iowa feature:

  • generate a 5 kilometer (5,000 meter) buffer around the point using st_buffer

  • find the bounding box of the buffered region using st_bbox

  • Convert that bbox object into as sfc then sf object

This region defines the AOI for this analysis.

Question 2

Step 1

The work for this question will be done in an Rscript! The first step takes a while and only needs to be done once!

The second part of any raster analysis is identifying the imagery/products you will use. For our analysis we will be using data from Landsat 8. Landsat 8 is the newest generation of the Landsat satellites and provides a useful resource for water detection. The OLI sensor aboard Landsat 8 has nine bands for capturing the spectral response of the earth’s surface at discrete wavelengths along the electromagnetic spectrum.

Amazon hosts all Landsat 8 data in a bucket associated with the OpenData Registry Initiative

The getlandsat R package provides a nice interface to this product for images taken between 2013-2017. Current efforts to extend this resource are underway using the SAT-API.


To find our images, we first need to list all available scenes, filter to those that meet our criteria (date and bounding box), and isolate the scenes unique identifier.

To do this:

  • load all available scenes with getlandsat::lsat_scenes() and assign it to an object
    • getlandsat::lsat_scenes() will download and extract a CSV of ALL scenes archived on the AWS bucket between (2013-2017) (n = 2,070,448)
    • While reading in this much data takes some time (~30 seconds), it only needs to be done when you are searching for scenes. That is why it should NOT be included in your Rmd.
  • Once its loaded, filter the scenes to those that are suitable for our AOI on 2016-09-26
    • transform your bbox to EPSG:4326, and create a new bounding box object (st_bbox)
    • Use >= and <= to filter based on your xmin, xmax, ymin, ymax values
    • The acquisitionDate comes as a POSIXct, these include hour:minutes:seconds. Cast it to a date object with as.Date, and check it against (as.Date("2016-09-26"))
  • Once you have identified the meta data for your scene, save it to a csv file in your data folder using write.csv:
write.csv(YOUR_FILTERED_DATA, file = "data/palo-flood-scene.csv")

Step 2

The work for this step will be done in your Rmd!

Now that you identified the meta data for you ideal image set, we need to download and cache the data on our computers.

Caching essentially means we will download the data to a standardized location known to the downloading utility.

Before any data is downloaded, the utility will check if it already exisits. If it does, then the path to the file is returned, if it does not, the data is downloaded.

The getlandsat package provides a nice caching system. To download and cache image data we need to do the following:

  • Read in the csv of your image meta data from the file saved in your data directory (read_csv)
  • Pass the download URL variable to lsat_scene_files
    • This will list all available files for that scene
    • We will then filter these to only include the TIF files for bands 1-6.

We can do this using grepl like last week! However we will expand on pattern matching techniques using multiple patterns and a constraint:

  • This time we will search for a set of patterns (B1.TIF, B2.TIF, B3.TIF,… B6.TIF) rather then a single one (like “C” or “R”)
    • We can search for multiple patterns by separating them with the “OR” operator |
    • This effectively checks for THIS pattern or THAT pattern (THIS|THAT)
  • We can create this pattern with paste and the collapse argument
    • Specifically, we want to collapse our patterns on the | separator:
paste0('B', 1:6, ".TIFF", collapse = "|")
## [1] "B1.TIFF|B2.TIFF|B3.TIFF|B4.TIFF|B5.TIFF|B6.TIFF"

When we look at the possible files for each scene we see that for each band, there is a .TIF file and a .TIF.ovr files

An .ovr file is used for storing the pyramid layers for a raster dataset and exists when the pyramids are built with ArcGIS 10 or later.

Since we are not using ArcGIS, we do not want the .ovr file.

So we can constrain our pattern search using the $ sign

In regular expression matching (like grep/grepl) the $ only returns the string if the pattern preceding the $ occurs at the end of each line

This pattern will be useful to you when you want to search for files with a specific extension latter on…

paste0('B', 1:6, ".TIFF$", collapse = "|")
## [1] "B1.TIFF$|B2.TIFF$|B3.TIFF$|B4.TIFF$|B5.TIFF$|B6.TIFF$"

Use the above pattern to filter the potential files retrieved with lsat_scene_files.


Once filtered, arrange the file names (so that band 1 is first and band 6 is last) and pull off the file variable.

The result should be a vector of file paths to download/cache.

Step 3

Great! Now that we have the URLs of the files we want, we need to download them to our cache location, again we will use the built in caching features of getlandsat.

lsat_image will take a file name - like those your created in step 2 - and download the file to your cache. If the download has already occurred, then the path to the cached image is returned without re downloading the data.

Right now we have 6 files we want, 1 for each band. So we want to apply lsat_image, over this vector of files to return a set of local file paths. For this we can use the apply family in base R.

  • lapply returns a list of elements resulting from a specified function (FUN) applied to all inputs.

  • sapply is a wrapper of lapply that returns a vector, matrix or, if simplify = “array”, an array rather then a list

  • vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.

We want to apply the lsat_image function over our URL paths to return a vector of local file paths … so we can use sapply where the files are the input and the FUN = lsat_image

st = sapply(files, lsat_image)

With a set of local file paths, you can create a raster stack! Remember your files need to be in the order of the bands (double check step 2).

b = stack(st)

Once the stack is made set the names to band1, band2, band3, etc. (Hint: use paste0(band, 1:6))

What are the dimensions of your stacked image? What is the CRS? What is the cell resolution?

Step 4

We only want to analyze our image for the regions surrounding Palo (our AOI). Transform your AOI to the CRS of the landsat stack and use it to crop your raster stack.

What are the dimensions of your cropped image stack? What is the CRS? What is the cell resolution?

Question 3

Awesome! We have now (1) identified, (2) downloaded, and (3) cached our images.

We have loaded them as a multiband raster object in R and cropped the domain to our AOI. Lets make a few RGB plots to see what these images reveal.

115 A-C covers the notion of spectral signatures (bands) and spectral combinations in greater detail as well as common task like atmospheric correction. We dont need to worry much about correction here since we are using a LT1 product (check the processingLevel in the metadata)

Step 1

Standard cameras replicate whats seen with the human eye, by capturing light in the red, green and blue wavelengths and applying red, green ,and blue filters (channels) to generate a natural looking RGB image.

With a multispectral Landsat 8 image, we have more information to work with and different wavelengths/combinations can help isolate particular features.

For example, the Near Infrared (NIR) wavelength is commonly used to analysis vegetation health because vegetation reflects strongly in this portion of the electromagnetic spectrum. Alternatively, the Shortwave Infrared (SWIR) bands are useful for discerning what is wet and dry.

When working with Landsat imagery, a logical first step is to load an image into an image analysis program (like ENVI) to visualize whats in the scene. We can do the same thing with R using the plotRGB function and selecting which band should populate each channel.

Below is a summary of the Landsat 8 bands, names, wavelengths, and resolutions captured by each sensor.

Go ahead and rename your raster stack with the names of the band (e.g. “coastal”, “red”, “green”, …)


There are many online examples (and classes like 115) that discuss the applications of the available bands in Landsat. What we want to do here is simply show how loading different combinations into the RGB channels make different features stand out.

You will make four unique combinations:

  • R-G-B (natural color)
  • NIR-R-G (fa) (color infared)
  • NIR-SWIR1-R (false color water focus)
  • Your choice

A useful reference of popular band combinations is here.

Step 2

Replicate the images produced above by applying a stretch to improve clarity and/or contrast. Try both stretch="lin" and stretch="hist" to find the one you like best.

Describe the purpose of applying a color stretch.

Question 4

Accurate assessment of surface water features (like flooding) have been made possible by remote sensing technology. Index methods are commonly used for surface water estimation using a threshold value.

For this lab we will look at 5 unique thresholding methods for delineating surface water features from different combinations of Landsat bands listed below:

Name Abbreviation Formula Water Threshold
Normalized difference vegetation index NDVI \[ \frac{NIR - Red}{NIR + Red} \] Cells less then 0
Normalized Difference Water Index NDWI \[ \frac{Green - NIR}{Green + NIR} \] Cells greater then 0
modified
normalized difference water index MNDWI \[ \frac{Green - SWIR1}{Green + SWIR1} \] Cells greater then 0
water ratio index WRI \[ \frac{Green + Red}{NIR + SWIR1} \] Cells greater then 1
simple water index SWI \[ \frac{1}{\sqrt{Blue - SWIR1}} \] Cells less then 5

Step 1

Raster Algebra

  • Create 5 new rasters using the formulas for NDVI, NDWI, MNDWI, WRI and SWI
  • Combine those new rasters into a stacked object
  • set the names of your new stack to useful values
  • Plot the new stack, using the following palette (colorRampPalette(c("blue", "white", "red"))(256))

Describe the 5 images. How are they simular and where do they deviate?

Step 2

Raster Thresholding

Here we will extract the flood extents from each of the above rasters using the thresholds defined in the above table.

For this, we will use the calc function and apply a custom formula for each calculated field from step 1 that applies the threshold in a way that flooded cells are 1 and non-flooded cells are 0.

Once completed, stack the binary ([0,1]) files into a new stack, set the names to meaningful descriptions, and plot the stack so that floods are blue, and background is white.

Perform one more classifier making sure that all NA values are set to zero.

Question 5

An alternative way to identify similar features in a continuous field is through supervised or unsupervised classification. Supervised classification groups values (cells) based on user supplied “truth” locations. Since flood events are fast-occurring there is rarely truth points for a live event. Instead developers rely on libraries of flood spectral signatures.

Unsupervised classification finds statistically significant groupings within the data. In these clustering algorithms, the user specifies the number of classes and the categorization is created based on the patterns in the data.

For this lab we will use a simple k-means algorithm to group raster cells with similar spectral properties.

Step 1

Anytime we want to be able to produce a consistent/reproducible result from a random process in R we need to set a seed. Do so using set.seed

Step 2

  • Extract the values from your 6-band raster stack with getValues
  • Check the dimensions of the extracted values with dim

What do the diminsions of the extracted values tell you about how the data was extracted?

  • Remove NA values from your extracted data with na.omit for safety

  • Use the kmeans clustering algorithm from the stats package to cluster the extracted raster data to a specified number of clusters k (centers). Start with 12.

  • Once the kmeans algorithm runs, the output will be a list of components. One of these is cluster which provides a vector of integers from (1:k) indicating the cluster to which each cell was allocated.

  • Create a new raster object by copying one of the original bands. For example:

kmeans_raster = s2$ndvi
  • Set the values of the copied raster to the cluster vector from the output kmeans object. For example:

Try a few different clusters (k) to see how the map changes.

Great! You now have a categorical raster with categories 1:k. The issue is we don’t know the value that corresponds to the flood water.

Step 3

To identify the flood category programatically, generate a table crossing the values of one of your binary flood rasters, with the values of you kmeans_raster. To do this, you will use the table function and pass it the values from a binary flood raster, and the values from your kmeans_raster. Here the following occurs:

  • table builds a contingency table counting the number of times each combination of factor levels in the input vector(s) occurs. This will give us a table quantifying how many cells with a value 1 are aligned with each of the k classes, and how many cells with a value 0 are aligned with each of the k classes.

  • If you pass the binary flood values as the first argument to table then the unique values (0,1) will be the rows. They will always be sorted meaning you know the flooded cells will be in the second row.

  • which.max() returns the index of the maximum value in a vector.

  • combine this information to identify the cluster in the kmeans data that coincides with the most flooded cells in the binary mask.

  • Once you know this value, use calc to extract the flood mask in a similar way to the thresholding you did above.

  • Finally use addLayer to add this new layer to your flood raster stack.

Question 6

Awesome! You have now created a flood raster using 6 different methods. Our last goal is to identify how they compare.

  • First we will calculate the total area of the flooded cells in each image. You can use cellStats to determine the sum of each layer. Since flooded cells have a value of 1, the sum of an entire band is equivalent to the number of flooded cells. You can then use the resolution of the cell to convert counts to a flooded area.

Print these values as a kable table

  • Second we can visualize the uncertainty in our classifications by summing the entire stack using calc. The higher the count in each pixel, the more certain we can be about its flooded state. For example, if a cell has a value of 6, it indicates that every method identified the cell as flooded, if it has a value of 2 then we know that two of the methods identified the cell as flooded.

Plot your flood map using the blues9 color palette

  • Third once you have a summed raster layer, copy it as a new layer, and set all 0 values to NA. Then map the raster with mapview. Zoom and pan around the interactive map noting that a pixel level is displayed in the upper right hand corner.

Why are some of the cell values not an even number?

Congratulations! You have successfully carried out a complete flood analysis from data acquisition through evaluation. This kind of work goes on regularly and is part of a couple national efforts (NOAA, USGS, FirstStreet, FEMA) to generate flood inundation libraries that contribute to better extraction and classification of realtime flood events, resource allocation during events, and damage assessments post events.

Here we used landsat imagery but the same process could be implemented on drone footage, MODIS data, or other private satellite iamgary.

Your evaluation was based purely on the raster data structure and your ability to conceptualize rasters as vectors of data with dimensional structure. You applied simple mathematical operators (+, /, -) to the raster bands, and a kmeans clustering algorithm to the data matrix of the multiband raster - all within ~100 lines of code!

Extra Credit

Check out this drone footage taken during the flooding in Palo, Iowa. In particular we see a bit of residential flooding at minute 2:17 (screen shot below)

Our goal is to see if our flood classifications were able to capture this event.


Use mapview to generate a slippy map of the Palo, Iowa bbox. Find the location shown in the above image using context clues and different base maps. Once you do, do the following:

  • Create a sfc object from the latitude and longitude of the mouse coordinates at the impacted location

  • use the st_point constructor to create an sfg; convert it to an sfc with the appropriate lat/lng CRS; and the transform to the CRS of the flood rasters

  • Use raster::extract to extract the binary flood values at that location from the six layer flood map stack

How many of the maps captured the flooding at that location?


Rubric

  • Question 1: AOI definition (10)
  • Question 2: Data Acquisition (20)
  • Question 3: Image Creation (20)
  • Question 4: Thresholding (35)
  • Question 5: Classification (35)
  • Question 6: Summary (20)
  • Extra Credit: Pixel Evaluation (20)
  • Well Structured and appealing Rmd deployed as web page (10)

Total: 150 points (170 points total)

Submission

You will submit a URL to your web page deployed with GitHub pages.

To do this:

  • Knit your lab 5 document
  • Stage/commit/push your files
  • If you followed the naming conventions in the “Set Up” of lab 2, your lab 5 link will be available at:

https://USERNAME.github.io/geog-13-labs/lab-05.html

Submit this URL in the appropriate Gauchospace dropbox. Also take a moment to update your personal webpage with this link and some bullet points of what you learned. While not graded as part of this lab, it will be your final!