JSON, GeoJSON, and STAC: Cloud-Native Geospatial Discovery
Learning Objectives
By the end of this week, you will be able to:
Parse and validate JSON from APIs and cloud storage systems
Read and write GeoJSON for spatial data exchange in reproducible workflows
Query STAC catalogs to discover and access petabyte-scale remote sensing datasets
Download and organize imaging assets directly into analysis pipelines (avoiding manual downloads)
Apply these tools to real water resource problems (e.g., flood extent mapping, drought monitoring)
Why These Standards Matter for Water Data Science
Modern water resource analysis requires fast, reproducible access to cloud-hosted data:
JSON: Standard format for REST APIs (USGS, NOAA, NWIS, NASA)
GeoJSON: Enables spatial data to flow between databases, web services, and R
STAC: Discovers satellite imagery, model outputs, and time-series data without manual catalog browsing
Real scenario: Analyzing the 2016 Palo, Iowa flood requires Landsat imagery from Sept 24–29, 2016. With STAC + rstac, that’s a 3-line search query. Without it: hours of clicking through NASA/USGS websites.
JSON: The Universal Data Exchange Format
JSON (JavaScript Object Notation) is everywhere in modern data science:
APIs: USGS NWIS, NOAA, OpenWeather, Mapbox, Google Earth Engine
Cloud storage: AWS S3 metadata, Azure metadata, Google Cloud Storage
Key rule: Coordinates are [longitude, latitude] (not lat, lon!)
GeoJSON Feature Example
A single flood extent:
{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[-91.8,42.06],[-91.79,42.07],[-91.78,42.06],[-91.8,42.06]]]},"properties":{"name":"Flooded Area in Palo","date":"2016-09-26","flood_depth_feet":18.5}}
Microsoft Planetary Computer hosts large datasets (Landsat, Sentinel, Copernicus, MODIS, etc.) and requires signed URLs for access. The rstac package handles this automatically:
# Sign all asset URLs for the first itemitem_signed <- search |>items_sign(rstac::sign_planetary_computer())item_signed$features[[1]]$assets$green$href # Now this URL is signed and ready for download#> [1] "https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2016/025/031/LC08_L2SP_025031_20160926_20200906_02_T1/LC08_L2SP_025031_20160926_20200906_02_T1_SR_B3.TIF?st=2026-04-14T21%3A20%3A17Z&se=2026-04-15T22%3A05%3A17Z&sp=rl&sv=2025-07-05&sr=c&skoid=9c8ff44a-6a2c-4dfb-b298-1c9212f64d9a&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2026-04-15T00%3A02%3A34Z&ske=2026-04-22T00%3A02%3A34Z&sks=b&skv=2025-07-05&sig=%2BOTGhGiKwIXrnO6kU4uaGmg5z1A9XFdY001GaFccOYs%3D"
This avoids keeping credentials in your code—much more secure!
Safe Download: Check File Size First
Before downloading large band files (100s of MB), inspect the file size with a HEAD request:
# Example: safe download patternfile_url <- item_signed[[1]]$assets$green$href# HEAD request: get metadata without downloadingh <- httr::HEAD(file_url)if (httr::status_code(h) ==200) { file_size_mb <-as.numeric(httr::headers(h)$`content-length`) /1e6cat("File size:", file_size_mb, "MB\n")# Only download if < 500 MBif (file_size_mb <500) { httr::GET(file_url, httr::write_disk("band_red.TIF", overwrite =TRUE))cat("Downloaded successfully\n") } else {cat("File too large; consider downloading selectively\n") }}
Since data is already in labs/data, load directly from the local directory:
# List downloaded Landsat band files (TIF format)raster_files <-list.files(path ="../labs/data",pattern ="*.TIF$",full.names =TRUE,recursive =TRUE)# Sort to ensure consistent band orderraster_files <-sort(raster_files)[1:6]# Stack into a single SpatRaster objectlandsat_stack <-rast(raster_files)band_names <-c('coastal', 'blue', 'green', 'red', 'nir08', 'swir16')landsat_stack <-setNames(landsat_stack, band_names)landsat_stack#> class : SpatRaster #> size : 7801, 7681, 6 (nrow, ncol, nlyr)#> resolution : 30, 30 (x, y)#> extent : 518085, 748515, 4506885, 4740915 (xmin, xmax, ymin, ymax)#> coord. ref. : WGS 84 / UTM zone 15N (EPSG:32615) #> sources : LC08_L2SP_025031_20160926_20200906_02_T1_SR_B1.TIF #> LC08_L2SP_025031_20160926_20200906_02_T1_SR_B2.TIF #> LC08_L2SP_025031_20160926_20200906_02_T1_SR_B3.TIF #> ... and 3 more sources#> names : coastal, blue, green, red, nir08, swir16
Complete Walkthrough: Crop to AOI
# Transform AOI to match raster CRS (UTM zone 15N)palo_utm <- bbox_palo %>%st_transform(crs(landsat_stack))# Crop the full scene to study area (saves memory, speeds processing)landsat_crop <-crop(landsat_stack, palo_utm)cat("\nCropped extent:\n")#> #> Cropped extent:print(ext(landsat_crop))#> SpatExtent : 598185, 601065, 4655805, 4659615 (xmin, xmax, ymin, ymax)cat("Cropped dimensions:", dim(landsat_crop), "\n")#> Cropped dimensions: 127 96 6
Complete Walkthrough: Visualize Band Combinations
Natural Color (RGB) — What the Human Eye Would See
plotRGB(landsat_crop, r =4, g =3, b =2, stretch ="lin", main ="Natural Color: Red-Green-Blue")
Visible: Water (dark), vegetation (green), urban (gray/tan), clouds (white)
Limitation: Flooding can look similar to vegetation in RGB alone.
Visualize: Color Infrared (Vegetation)
plotRGB(landsat_crop, r =5, g =4, b =3, stretch ="lin",main ="Color Infrared: NIR-Red-Green\n(Bright red = healthy vegetation)")
Visible: - Bright red/magenta = healthy vegetation (strong NIR reflectance) - Dark blue/black = water - Gray/tan = soil/urban
Why NIR matters: Vegetation reflects ~50% of NIR light; water absorbs it. This makes forests “glow” red.
Visualize: Water-Focused False Color
plotRGB(landsat_crop, r =5, g =6, b =4, stretch ="lin",main ="Water Focus: NIR-SWIR1-Red\n(Dark blue = open water/flooding)")
Visible: - Dark blue/black = open water, flooded areas (absorbs NIR + SWIR) - Bright colors = vegetation & land - Often shows flooding more clearly than natural color
Why SWIR1 matters: Water absorbs SWIR; vegetation reflects it. This combo best delineates water boundaries.
SWIR false color showing water features distinctly
Complete Walkthrough: NDVI
$ NDVI = $
# Extract individual bands for calculations# Calculate key water indicesndvi <- (landsat_crop$nir08 - landsat_crop$red) / (landsat_crop$nir08 + landsat_crop$red)plot(ndvi, col =colorRampPalette(c("blue", "white", "red"))(256))
Thresholding to Binary Flood Maps
NDVI < 0 typically indicates water. We can apply this threshold to create a binary flood map:
# Apply thresholds specific to each indexndvi_flood <-app(ndvi, function(x) ifelse(x <0, 1, 0)) # NDVI < 0 = water# Plot: blue = flooded, white = non-floodedplot(ndvi_flood, col =c("white", "blue"), main =c("NDVI-based"))