Lecture 26

Introduction to Raster Data

Recap: `sf` and spatial concepts

Spatial phenomena can be thought of discrete objects with clear boundaries or as continuous phenomenon that can be observed everywhere, with no natural boundaries.
We have described these as objects and fields (Kuhn, 2012)
Objects are usually represented by vector data consisting of:
- a geometry (simple features (sfc, sfg))
- some attribute information (data.frame)
In R these are unified as a sf object
Field data is typically represented by raster data
For this, we will begin our discussions using the terra package

`terra`

Like sf, terra is an implementation of standard raster data model
The model is used by all GIS platforms
Represented continuous data either as continuous or categorical values as a regular set of cells in a grid (matrix)
cells have: (1) resolution, (2) infered cell coordinate (centroid) (3) the coordinate and value apply to the entire cell area

Recap: R data structures

Vector:
- A vector can have dimensions
  - A 1D vector in a collection of values
  - A 2D vector is a matrix
  - A 3D vector is an array
List: a collection of objects
data.frame: a list with requirement of equal length column (vectors)
data.frames and lists (sfc) defined our vector model
Arrays will define our raster model

Spatial Extent

One last topic with respect to vector data (that will carry us into raster) is the idea of an extent:

(ny <- AOI::aoi_get(state = "NY") |> 
   st_transform(5070) |> 
   dplyr::select(name))
#> Simple feature collection with 1 feature and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 1319502 ymin: 2149150 xmax: 1997508 ymax: 2658543
#> Projected CRS: NAD83 / Conus Albers
#>       name                       geometry
#> 1 New York MULTIPOLYGON (((1661335 263...

In geometry, the minimum bounding box for a point set (stored as POINT, POLYLINE, POLYGON) in N dimensions is “…the box with the smallest measure within which all the points lie.”

We can extract bounding box coordinates with `st_bbox`

returns: an object of class bbox of length 4.

(bb = st_bbox(ny))
#>    xmin    ymin    xmax    ymax 
#> 1319502 2149150 1997508 2658543

class(bb)
#> [1] "bbox"

typeof(bb)
#> [1] "double"

There is a method for creating an sfc form a `bbox` object

(bb = st_as_sfc(bb))
#> Geometry set for 1 feature 
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 1319502 ymin: 2149150 xmax: 1997508 ymax: 2658543
#> Projected CRS: NAD83 / Conus Albers
#> POLYGON ((1319502 2149150, 1997508 2149150, 199...

class(bb)
#> [1] "sfc_POLYGON" "sfc"

typeof(bb)
#> [1] "list"

Result:

plot(bb, border = rgb(0,0,1))
plot(ny, add = TRUE, col = rgb(1,0,0, .5))

Extents can be discritized in a number of ways:

grid = st_make_grid(bb)
plot(ny$geometry)
plot(grid, add = TRUE)

grid1km = st_make_grid(bb, cellsize = 10000)
plot(ny$geometry)
plot(grid1km, add = TRUE)

What makes up a regular tesselation

length(grid1km) # how many grid tiles
#> [1] 3468

mapview::npts(grid1km) # how many points?
#> [1] 17340

mapview::npts(grid1km) * 2 # how many X and Y?
#> [1] 34680

mapview::npts(grid1km) / length(grid) # how many points per tile?
#> [1] 173.4

sqrt(st_area(grid1km)[1]) # length of each tile?
#> 10000 [m]

st_bbox(grid1km) # extent of grid
#>    xmin    ymin    xmax    ymax 
#> 1319502 2149150 1999502 2659150

Alternative representation

Regular grids can also be indexed by their centroids

cent <- st_centroid(grid1km)

plot(ny$geometry)
plot(cent, add = TRUE, pch = 16, cex = .25)

length(cent) # how many grid tiles
#> [1] 3468

mapview::npts(grid1km) # how many points?
#> [1] 17340

mapview::npts(grid1km) * 2 # how many X and Y?
#> [1] 34680

Raster Model

The raster model is one of the earliest and most widely used data models within geographic information systems (Tomlin, 1990; Goodchild, 1992, Maguire, 1992).
Typically used to record, analyze and visualize data with a continuous nature such as elevation, temperature (“GIS”), or reflected or emitted electromagnetic radiation (“Remote Sensing”)
Quotes are used because you’ll find from a data perspective these differences are artificial and a product of the ESRI/ENVI/ERDAS divide

The term raster originated from the German word for screen, implying a series of orthogonality oriented parallel lines.
Digital raster objects most often take the form of a regularly spaced, grid-like pattern of rows and columns
Each element referred to as a cell, pixel, or grid point.

Many terms mean the same thing …

The entire raster is sometimes referred to as an “image”, “array”, “surface”, “matrix”, or “lattice” (Wise, 2000).
The all mean the same thing…
Cells of the raster are most often square, but may be rectangular (with differing resolutions in x and y directions) or other shapes that can be tessellated such as triangles and hexagons (Figure below from Peuquet, 1984).

Photos and Computers …

Aerial Imagery (really just a photo 😄)

What is stored in these cells?

Categorical Values (integer/factor)

Continuous Values (numeric)

Spectral Values

Either Color, or sensor

Why do we care?

Pixels are the base unit of raster data and have a resolution
This is the X and the Y dimension of each cell in the units of the CRS

Raster images seek to discritize the real world into cell-based values

Again either integer (categorical), continuous, or signal

Resolution drives image clarity (granulairty)

Higher resolution (smaller cells) = more detail, but bigger data!

All rasters have an extent!

This is the same extent as a bounding box
Can be described as 4 values (xmin,ymin,xmax,ymax)

Implicit Coordinates

Unlike vector data, the raster data model stores the coordinate of the grid cells indirectly
Coordinates are derived from the reference (Xmin,Ymin) the resolution, and the cell index (e.g. [100,150])
For example: If we want the coordinates of a value in the 3rd row and the 40th column of a raster matrix, we have to move from the origin (Xmin, Ymin) (3 x Xres) in x-direction and (40 x Yres) in y-direction

So, any image (.png, .tif, .gif) can be read as a raster…
The raster is defined by the extent and resolution of the cells
To be spatial, the extent (thus coordinates) must be grounded in an CRS

(img = terra::rast('images/17-raster-extent.png'))
#> class       : SpatRaster 
#> dimensions  : 788, 1067, 4  (nrow, ncol, nlyr)
#> resolution  : 1, 1  (x, y)
#> extent      : 0, 1067, 0, 788  (xmin, xmax, ymin, ymax)
#> coord. ref. :  
#> source      : 17-raster-extent.png 
#> names       : 17-rast~xtent_1, 17-rast~xtent_2, 17-rast~xtent_3, 17-rast~xtent_4

terra::plotRGB(img, r = 1, g = 2, b = 3)

terra I/O

terra is an R pacakge that binds to GDAL exposing the drivers for I/0

library(terra)
gdal()
#> [1] "3.8.5"

terra::gdal(drivers = TRUE) |> DT::datatable()

Local Raster Data

All raster data is read with rast()
Critically, rast() reads the data header, before calling data into memory
Only when an operation that needs the values is called, is the data streamed into memory

rast("data/foco-elev.tif")
#> class       : SpatRaster 
#> dimensions  : 725, 572, 1  (nrow, ncol, nlyr)
#> resolution  : 30, 30  (x, y)
#> extent      : -769695, -752535, 1978485, 2000235  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
#> source      : foco-elev.tif 
#> name        :    dem 
#> min value   : 146730 
#> max value   : 197781

Raster Data from the Web

GDAL exposes a number of Virtual System Interfaces (VSI) that allow for reading across various data protocols! These are implemented by prefixing a URL with the needed interface

http(s): /vsicurl/
s3: /vsis3/
zip directories: /vsizip/, /vsitar/
and more!

url <- 'https://raw.githubusercontent.com/mikejohnson51/csu-ess-330/a993e03df117a76c609ff4c018055f8c821a6de9/resources/foco-elev.tif'

rast(glue::glue("/vsicurl/{url}"))
#> class       : SpatRaster 
#> dimensions  : 750, 591, 1  (nrow, ncol, nlyr)
#> resolution  : 29.01279, 29.01279  (x, y)
#> extent      : -769693.5, -752546.9, 1978477, 2000236  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
#> source      : foco-elev.tif 
#> name        : foco-elev 
#> min value   :      1466 
#> max value   :      1975

CONUS DEM

The CONUS DEM is a 1 arc-second (30m) resolution elevation data set for the continental United States stored in an an s3 bucket (Amazon Cloud)
The s3 vsi function allows us to read the data directly from the cloud
The data is stored in a virtual raster format (VRT) which is a pointer to many data files…

rast('/vsis3/lynker-spatial/gridded-resources/dem.vrt')
#> class       : SpatRaster 
#> dimensions  : 114503, 163008, 1  (nrow, ncol, nlyr)
#> resolution  : 30, 30  (x, y)
#> extent      : -2470965, 2419275, 186285, 3621375  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
#> source      : dem.vrt 
#> name        : dem

Raster Data in R

A SpatRast represents single-layer (variable) raster data.

A SpatRast always stores the fundamental parameters that describe it. - The number of columns and rows, - The spatial extent - The Coordinate Reference System.

In addition, a SpatRast can store information about the file where raster values are stored (if there is such a file).

Here we construct an empty raster:

(r <- rast(ncol=20, nrow=20, xmax=-80, xmin=-120, ymin=20, ymax=60))
#> class       : SpatRaster 
#> dimensions  : 20, 20, 1  (nrow, ncol, nlyr)
#> resolution  : 2, 2  (x, y)
#> extent      : -120, -80, 20, 60  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84)

Extract Diminisionality …

nrow(r)
#> [1] 20
ncol(r)
#> [1] 20
ncell(r)
#> [1] 400
nlyr(r)
#> [1] 1

Raster Values

values can be extracted with values()

head(values(r))
#>      lyr.1
#> [1,]   NaN
#> [2,]   NaN
#> [3,]   NaN
#> [4,]   NaN
#> [5,]   NaN
#> [6,]   NaN

Assigning a values as a vector

values(r) <- 1:ncell(r)

r
#> class       : SpatRaster 
#> dimensions  : 20, 20, 1  (nrow, ncol, nlyr)
#> resolution  : 2, 2  (x, y)
#> extent      : -120, -80, 20, 60  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84) 
#> source(s)   : memory
#> name        : lyr.1 
#> min value   :     1 
#> max value   :   400

head(values(r))
#>      lyr.1
#> [1,]     1
#> [2,]     2
#> [3,]     3
#> [4,]     4
#> [5,]     5
#> [6,]     6

plot(r)

Raster is S4

Compared to S3, the S4 object system is much stricter, and much closer to other Object Oriented systems.
What does this mean for us?

Data is structured with a “representation”
which is a list of slot (or attributes), giving their names and classes, accessed with @

str(r, max.level = 2)
#> S4 class 'SpatRaster' [package "terra"]
r@pntr
#> C++ object <0x128469540> of class 'SpatRaster' <0x10ee8ad30>
r@pntr$ncol
#> Class method definition for method ncol()
#> function () 
#> {
#>     " unsigned long ncol()  \n   docstring : ncol"
#>     .External(list(name = "CppMethod__invoke_notvoid", address = <pointer: 0x10e8211a0>, 
#>         dll = list(name = "Rcpp", path = "/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/Rcpp/libs/Rcpp.so", 
#>             dynamicLookup = TRUE, handle = <pointer: 0x79b993b0>, 
#>             info = <pointer: 0x10e8be530>, forceSymbols = FALSE), 
#>         numParameters = -1L), <pointer: 0x10ee8ad30>, <pointer: 0x10ee8edd0>, 
#>         .pointer)
#> }
#> <environment: 0x12cdc9b90>

Access via Function

ext(r)
#> SpatExtent : -120, -80, 20, 60 (xmin, xmax, ymin, ymax)
crs(r)
#> [1] "GEOGCRS[\"WGS 84 (CRS84)\",\n    DATUM[\"World Geodetic System 1984\",\n        ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n            LENGTHUNIT[\"metre\",1]]],\n    PRIMEM[\"Greenwich\",0,\n        ANGLEUNIT[\"degree\",0.0174532925199433]],\n    CS[ellipsoidal,2],\n        AXIS[\"geodetic longitude (Lon)\",east,\n            ORDER[1],\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        AXIS[\"geodetic latitude (Lat)\",north,\n            ORDER[2],\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n    USAGE[\n        SCOPE[\"unknown\"],\n        AREA[\"World\"],\n        BBOX[-90,-180,90,180]],\n    ID[\"OGC\",\"CRS84\"]]"
nrow(r)
#> [1] 20
ncol(r)
#> [1] 20
head(values(r))
#>      lyr.1
#> [1,]     1
#> [2,]     2
#> [3,]     3
#> [4,]     4
#> [5,]     5
#> [6,]     6

Assignment

Starting with this base …

url <- 'https://raw.githubusercontent.com/mikejohnson51/csu-ess-330/refs/heads/main/resources/foco-elev-cm.tif'

… construct a VSI prefixed URL that allows rast to read the data directly from our class Github Repo.

Once the date is read in, describe the structure of the data in a few sentences.
Nativly, the data has a horizonal unit of cm. Multiply the data by 0.0328084 to convert the data to feet.
Extract the values from the raster into a data.frame by using the dataframe=TRUE parameter in the values() function.
Use ggpubr to create a density plot of the elevation data in feet. Equit the plot with clear labs, and a title as well as a theme.
Save your plot as a png file with ggsave and submit to the Canvase Dropbox.