<- st_cast(st_union(aoi_get(state = "conus")), "MULTILINESTRING")
usa
nrow(cities)
#> [1] 31254
Week 3
Predicates, Simplification & Tesselations
Last week
We learned about the SRS (Spatial Reference Systems)
They can be geographic (3D, angular units)
Ellipsoid (squished sphere model)
geoid (mathematical model of gravitational field)
datum marks the relationship between the ellipsoid and geoid
can be local (NAD27)
can be global (WGS84, NAD83)
SRS can be projected (x-y axis, measurement units (m), false origin)
All PCS are based on GCS
Projections seek to place the 3D earth on 2D
Does this by defining a plane that can be conic, cylindrical or planar
“unfolding to this plane” creates distortion of shape, area, distance, or direction
The distortion is greater from point/lines of tangency or secanacy
Measures (GEOS Measures)
Measures are the questions we ask about the dimension of a geometry
How long is a line or polygon perimeter (unit)
What is the area of a polygon (unit2)
How far are two object from one another (unit)
Measures come from the GEOS library
Measures are in the units of the base projection
Measures can be Euclidean (PCS) or geodesic (GSC)
geodesic (Great Circle) distances can be more accurate by eliminating distortion
but are much slower to calculate
For example …
# Great Circle Distance in GCS
system.time({x <- st_distance(usa, cities)})
# user system elapsed
# 103.560 1.390 117.128
# Euclidean Distance on PCS
system.time({x <- st_distance(usa, cities, which = "Euclidean")})
# user system elapsed
# 2.422 0.019 2.494
Units
When possible measure operations report results with a units appropriate for the CRS:
<- st_read("data/co.shp")
co #> Reading layer `co' from data source
#> `/Users/mikejohnson/github/csu-ess-523c/slides/data/co.shp'
#> using driver `ESRI Shapefile'
#> Simple feature collection with 64 features and 4 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -109.0602 ymin: 36.99246 xmax: -102.0415 ymax: 41.00342
#> Geodetic CRS: WGS 84
<- st_area(co[1,])
a attributes(a) |> unlist()
#> units.numerator1 units.numerator2 class
#> "m" "m" "units"
The units package can be used to convert between units:
::set_units(a, km^2) # result in square kilometers
units#> 3062.85 [km^2]
::set_units(a, ha) # result in hectares
units#> 306285 [ha]
and the results can be stripped of their attributes for cases where numeric values are needed (e.g. math operators and ggplot):
as.numeric(a)
#> [1] 3062849758
Topologic Diminsion
A POINT is shape with a dimension of 0 that occupies a single location in coordinate space.
# POINT defined as numeric vector
st_dimension(st_point(c(0,1))))
(#> [1] 0
A LINESTRING is shape that has a dimension of 1 (length)
# LINESTRING defined by matrix
st_dimension(st_linestring(matrix(1:4, nrow = 2))))
(#> [1] 1
A POLYGON is surface stored as a list of its exterior and interior rings. It has a dimension of 2. (area)
# POLYGON defined by LIST (interior and exterior rings)
st_dimension(st_polygon(list(matrix(c(1:4, 1,2), nrow = 3, byrow = TRUE)))))
(#> [1] 2
Predicates
Geometric Interiors, Boundaries and Exteriors
All geometries have interior, boundary and exterior regions.
The terms interior
and boundary
are used in the context of algebraic topology and manifold theory and not general topology
The OGC has define these states for the common geometry types in the simple features standard:
Dimension | Geometry Type | Interior (I) | Boundary (B) |
---|---|---|---|
Point, MultiPoint | 0 | Point, Points | Empty |
LineString, Line | 1 | Points that are left when the boundary points are removed. | Two end points. |
Polygon | 2 | Points within the rings. | Set of rings. |
Interior, Boundary and Exterior: POINTS
Interior, Boundary and Exterior: LINESTRING
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
Interior, Boundary and Exterior: POLYGON
Summary
Predicates
In the following, we are interested in the resulting geometry that occurs when 2 geometries are overlaid…
Overlap is a POINT: 0D
Overlap is a LINESTRING: 1D
Overlap is a POLYGON: 2D
No Overlap = FALSE
Dimensionally Extended 9-Intersection Model
The DE-9IM is a topological model and (standard) used to describe the spatial relations of two geometries
Used in geometry, point-set topology, geospatial topology
The DE-9IM matrix provides a way to classify geometry relations using the set {0,1,2,F} or {T,F}
With a {T,F} matrix domain, there are 512 possible relations that can be grouped into binary classification schemes.
About 10 of these, have been given a common name such as intersects, touches, and within.
When testing two geometries against a scheme, the result is a spatial predicate named by the scheme.
Provides the primary basis for queries and assertions in GIS and spatial databases (PostGIS).
The Matrix Model
The DE-9IM matrix is based on a 3x3 intersection matrix:
Where:
- dim is the dimension of the intersection and
- I is the interior
- B is the boundary
- E is the exterior
. . .
Empty sets are denoted as F
non-empty sets are denoted with the maximum dimension of the intersection {0,1,2}
A simpler (binary) version of this matrix can be created by mapping all non-empty intersections {0,1,2} to TRUE.
Where II
would state: “Does the Interior of”a” overlaps with the Interior of “b” in a way that produces a point (0), line (1), or polygon (2)”
Where IB
would state: “Does the Interior of”a” overlap with the Boundary of “b” in a way that produces a point (0), line (1), or polygon (2)”
Both matrix forms: - dimensional {0,1,2,F} - Boolean {T,F}
. . .
Can be serialize as a “DE-9IM string code” representing the matrix in a single string element (standardized format for data interchange)
The OGC has standardized the typical spatial predicates (Contains, Crosses, Intersects, Touches, etc.) as Boolean functions, and the DE-9IM model as a function that returns the DE-9IM code, with domain of {0,1,2,F}
Illustration
Reading from left-to-right and top-to-bottom, the DE-9IM(a,b) string code is ‘212101212’
Spatial Relations in R
- Geometry X is a 3 feature polygon colored in red
- Geometry Y is a 4 feature polygon colored in blue
st_relate(x,y)
#> [,1] [,2] [,3] [,4]
#> [1,] "212FF1FF2" "FF2FF1212" "212101212" "FF2FF1212"
#> [2,] "FF2FF1212" "212101212" "212101212" "FF2FF1212"
#> [3,] "FF2FF1212" "FF2FF1212" "212101212" "FF2FF1212"
st_relate(x,x)
#> [,1] [,2] [,3]
#> [1,] "2FFF1FFF2" "FF2F01212" "FF2F11212"
#> [2,] "FF2F01212" "2FFF1FFF2" "FF2FF1212"
#> [3,] "FF2F11212" "FF2FF1212" "2FFF1FFF2"
st_relate(y,y)
#> [,1] [,2] [,3] [,4]
#> [1,] "2FFF1FFF2" "FF2FF1212" "212101212" "FF2FF1212"
#> [2,] "FF2FF1212" "2FFF1FFF2" "212101212" "FF2FF1212"
#> [3,] "212101212" "212101212" "2FFF1FFF2" "FF2FF1212"
#> [4,] "FF2FF1212" "FF2FF1212" "FF2FF1212" "2FFF1FFF2"
Named Predicates
“named spatial predicates” have been defined for some common relations.
A few functions can be derived (expressed by masks) from DE-9IM include (* = wildcard):
Predicate | DE-9IM String Code | Description |
---|---|---|
Intersects | T*F**FFF* |
“Two geometries intersect if they share any portion of space” |
Overlaps | T*F**FFF* |
“Two geometries overlap if they share some but not all of the same space” |
Equals | T*F**FFF* |
“Two geometries are topologically equal if their interiors intersect and no part of the interior or boundary of one geometry intersects the exterior of the other” |
Disjoint | FF*FF***** |
“Two geometries are disjoint: they have no point in common. They form a set of disconnected geometries.” |
Touches | FT******* |
F**T***** |
Contains | T*****FF** |
“A contains B: geometry B lies in A, and the interiors intersect” |
Covers | T*****FF* |
*T****FF* |
Within | *T*****FF* |
**T****FF* |
Covered by | *T*****FF* |
**T****FF* |
Binary Predicates
Collectively, predicates define the type of relationship each 2D object has with another.
Of the ~ 512 unique relationships offered by the DE-9IM models a selection of ~ 10 have been named.
These are include in PostGIS/GEOS and are made accessible via R sf
Named predicates in R
sf
provides a set of functions that implement the DE-9IM model and named predicates. Each of these can return either a sparse or dense matrix.
sparse matrix
st_intersects(x,y)
#> Sparse geometry binary predicate list of length 3, where the predicate
#> was `intersects'
#> 1: 1, 3
#> 2: 2, 3
#> 3: 3
dense matrix
st_intersects(x, y, sparse = FALSE)
#> [,1] [,2] [,3] [,4]
#> [1,] TRUE FALSE TRUE FALSE
#> [2,] FALSE TRUE TRUE FALSE
#> [3,] FALSE FALSE TRUE FALSE
st_disjoint(x, y, sparse = FALSE)
#> [,1] [,2] [,3] [,4]
#> [1,] FALSE TRUE FALSE TRUE
#> [2,] TRUE FALSE FALSE TRUE
#> [3,] TRUE TRUE FALSE TRUE
st_touches(x, y, sparse = FALSE)
#> [,1] [,2] [,3] [,4]
#> [1,] FALSE FALSE FALSE FALSE
#> [2,] FALSE FALSE FALSE FALSE
#> [3,] FALSE FALSE FALSE FALSE
st_within(x, y, sparse = FALSE)
#> [,1] [,2] [,3] [,4]
#> [1,] FALSE FALSE FALSE FALSE
#> [2,] FALSE FALSE FALSE FALSE
#> [3,] FALSE FALSE FALSE FALSE
st_relates vs. predicate calls…
= filter(aoi_get(state = "all"), state_abbr %in% c("WA", "OR", "MT", "ID")) |>
states select(name)
= filter(states, name == "Washington") wa
plot(states$geometry)
mutate(states,
(deim9 = st_relate(states, wa),
touch = st_touches(states, wa, sparse = F)))
#> Simple feature collection with 4 features and 3 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -124.8485 ymin: 41.98818 xmax: -104.0397 ymax: 49.00244
#> Geodetic CRS: WGS 84
#> name geometry deim9 touch
#> 1 Idaho MULTIPOLYGON (((-111.0455 4... FF2F11212 TRUE
#> 2 Montana MULTIPOLYGON (((-109.7985 4... FF2FF1212 FALSE
#> 3 Oregon MULTIPOLYGON (((-117.22 44.... FF2F11212 TRUE
#> 4 Washington MULTIPOLYGON (((-121.5237 4... 2FFF1FFF2 FALSE
So…
binary predicates return conditional {T,F} relations based on predefined masks of the DE-9IM strings
Up to this point in class we have been using Boolean {T,F} masks to filter data.frames by columns:
= c("apple", "orange", "lemon", "watermelon") fruits
== "apple"
fruits #> [1] TRUE FALSE FALSE FALSE
%in% c("apple", "lemon")
fruits #> [1] TRUE FALSE TRUE FALSE
Filtering on data.frames
- We have used
dplyr::filter
to subset a data frame, retaining all rows that satisfy aboolean
condition.
mutate(states, equalsWA = (name == "Washington")) |>
st_drop_geometry()
#> name equalsWA
#> 1 Idaho FALSE
#> 2 Montana FALSE
#> 3 Oregon FALSE
#> 4 Washington TRUE
filter(states, name == "Washington")
#> Simple feature collection with 1 feature and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -124.8485 ymin: 45.54364 xmax: -116.9161 ymax: 49.00244
#> Geodetic CRS: WGS 84
#> name geometry
#> 1 Washington MULTIPOLYGON (((-121.5237 4...
Spatial Filtering
We can filter spatially, use
st_filter
as the function callHere the
boolean
condition is not passed (e.g. name == WA)But instead, is defined by a spatial predicate
The default is
st_intersects
but can be changed with the.predicate
argument:
mutate(states,
touch = st_touches(states, wa, sparse = FALSE))
#> Simple feature collection with 4 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -124.8485 ymin: 41.98818 xmax: -104.0397 ymax: 49.00244
#> Geodetic CRS: WGS 84
#> name geometry touch
#> 1 Idaho MULTIPOLYGON (((-111.0455 4... TRUE
#> 2 Montana MULTIPOLYGON (((-109.7985 4... FALSE
#> 3 Oregon MULTIPOLYGON (((-117.22 44.... TRUE
#> 4 Washington MULTIPOLYGON (((-121.5237 4... FALSE
st_filter(states, wa, .predicate = st_touches)
#> Simple feature collection with 2 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -124.7035 ymin: 41.98818 xmax: -111.0435 ymax: 49.00085
#> Geodetic CRS: WGS 84
#> name geometry
#> 1 Idaho MULTIPOLYGON (((-111.0455 4...
#> 2 Oregon MULTIPOLYGON (((-117.22 44....
Result
ggplot(states) +
geom_sf() +
geom_sf(data = wa, fill = "blue", alpha = .3) +
geom_sf(data = st_filter(states, wa, .predicate = st_touches), fill = "red", alpha = .5) +
theme_void()
Distance Example (additional parameter)
= read_csv("../labs/data/uscities.csv") |>
cities st_as_sf(coords = c("lng", "lat"), crs = 4326) |>
select(city, population, state_name) |>
st_transform(5070)
st_filter(cities,
filter(cities, city == "Fort Collins"),
.predicate = st_is_within_distance, 10000)
#> Simple feature collection with 2 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -760147.5 ymin: 1982249 xmax: -751658.3 ymax: 1984621
#> Projected CRS: NAD83 / Conus Albers
#> # A tibble: 2 × 4
#> city population state_name geometry
#> * <chr> <dbl> <chr> <POINT [m]>
#> 1 Fort Collins 339256 Colorado (-760147.5 1984621)
#> 2 Timnath 8007 Colorado (-751658.3 1982249)
Spatial Data Science
Filtering is a nice way to reduce the dimensions of a single dataset
Often “… one table is not enough… ”
In these cases we want to combine - or join - data
Spatial Joining
- Joining two non-spatial datasets relies on a shared key that uniquely identifies each record in a table
Spatially joining data relies on shared geographic relations rather then a shared key
Like filter, these relations can be defined by a predicate
As with tabular data, mutating joins add data to the target object (x) from a source object (y).
st_join
In
sf
st_join
provides this joining capacityBy default,
st_join
performs a left join (Returns all records from x, and the matched records from y)
- It can also do inner joins by setting
left = FALSE
.
The default predicate for
st_join
(andst_filter
) isst_intersects
This can be changed with the join argument (see
?st_join
for details).
Every Starbucks in the World
What county has the most Starbucks in each state?
starbucks = readr::read_csv('../labs/data/directory.csv') |>
(filter(!is.na(Latitude), Country == "US") |>
st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326) |>
st_transform(5070))
#> Simple feature collection with 13608 features and 11 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -6223543 ymin: 278590.9 xmax: 2122577 ymax: 5307178
#> Projected CRS: NAD83 / Conus Albers
#> # A tibble: 13,608 × 12
#> Brand `Store Number` `Store Name` `Ownership Type` `Street Address` City
#> * <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Starbucks 3513-125945 Safeway-Anc… Licensed 5600 Debarr Rd … Anch…
#> 2 Starbucks 74352-84449 Safeway-Anc… Licensed 1725 Abbott Rd Anch…
#> 3 Starbucks 12449-152385 Safeway - A… Licensed 1501 Huffman Rd Anch…
#> 4 Starbucks 24936-233524 100th & C S… Company Owned 320 W. 100th Av… Anch…
#> 5 Starbucks 8973-85630 Old Seward … Company Owned 1005 E Dimond B… Anch…
#> 6 Starbucks 72788-84447 Fred Meyer … Licensed 1000 E Northern… Anch…
#> 7 Starbucks 79549-106150 Safeway - A… Licensed 3101 PENLAND PK… Anch…
#> 8 Starbucks 75988-107245 ANC Main Te… Licensed HMSHost, 500 We… Anch…
#> 9 Starbucks 11426-99254 Tudor Rd an… Company Owned 110 W. Tudor Rd… Anch…
#> 10 Starbucks 20349-108249 Fred Meyer-… Licensed 7701 Debarr Road Anch…
#> # ℹ 13,598 more rows
#> # ℹ 6 more variables: `State/Province` <chr>, Country <chr>, Postcode <chr>,
#> # `Phone Number` <chr>, Timezone <chr>, geometry <POINT [m]>
= aoi_get(state = "conus", county = "all") |>
counties st_transform(5070) |>
select(name, state_name)
<- st_join(counties, starbucks) |>
topSB group_by(name, state_name) |>
summarise(n = n()) |>
group_by(state_name) |>
slice_max(n, n = 1) |>
ungroup()
- Neither joining nor filtering spatially alter the underlying geometry of the features
- In cases where we seek to alter a geometry based on another, we need clipping methods
Clipping
Clipping is a form of subsetting that involves changing the geometry of at least some features.
Clipping can only apply to features more complex than points: (lines, polygons and their ‘multi’ equivalents).
Spatial Subsetting
- By default the data.frame subsetting methods we’ve seen (e.g
[,]
) implementsst_intersection
= st_transform(wa, 5070)
wa = starbucks[wa,] #<<
wa_starbucks
ggplot() + geom_sf(data = wa) + geom_sf(data = wa_starbucks) + theme_void()
Simplification
Computational Complexity
In all the cases we have looked at, the number of
POINT
(e.g geometries, nodes, or vertices) define the complexity of the predicate or the clipComputational needs increase with the number of POINTS / NODES / VERTICES
Simplification is a process for generalization vector objects (lines and polygons)
Another reason for simplifying objects is to reduce the amount of memory, disk space and network bandwidth they consume
Other times the level of detail captured in a geometry is either not needed, or, even counter productive the the scale/purpose of an analysis
If you are cropping features to a national border, how much detail do you need? The more points in your border, the long the clip operation will take….
In cases where we want to reduce the complexity in a geometry we can use simplification algorithms
The most common algorithms are Ramer–Douglas–Peucker and Visvalingam-Whyatt.
Ramer–Douglas–Peucker
Mark the
first
andlast
points as keptFind the point,
p
that is the farthest from the first-last line segment. If there are no points between first and last we are done (the base case)If
p
is closer thantolerance
units to the line segment then everything between first and last can be discardedOtherwise, mark p as kept and repeat steps 1-4 using the points between first and p and between p and last (the call to recursion)
st_simplify
st_simplify
sf
providesst_simplify
, which uses the GEOS implementation of the Douglas-Peucker algorithm to reduce the vertex count.st_simplify
uses thedTolerance
to control the level of generalization in map units (see Douglas and Peucker 1973 for details).
= aoi_get(state = "conus") |>
usa st_union() |>
st_transform(5070)
= st_simplify(usa, dTolerance = 10000)
usa1000 = st_simplify(usa, dTolerance = 100000)
usa10000 = st_simplify(usa, dTolerance = 1000000) usa100000
A limitation with Douglas-Peucker (therefore st_simplify) is that it simplifies objects on a per-geometry basis.
This means the ‘topology’ is lost, resulting in overlapping and disconnected geometries.
= st_transform(states, 5070)
states = st_simplify(states, dTolerance = 20000)
simp_states plot(simp_states$geometry)
Visvalingam
- The Visvalingam algorithm overcomes some limitations of the Douglas-Peucker algorithm (Visvalingam and Whyatt 1993).
- it progressively removes points with the least-perceptible change.
- Simplification often allows the elimination of 95% or more points while retaining sufficient detail for visualization and often analysis
rmapshaper
In R, the rmapshaper
package implements the Visvalingam algorithm in the ms_simplify
function.
- The
ms_simplify
function is a wrapper around themapshaper
JavaScript library (created by lead viz experts at the NYT)
library(rmapshaper)
= ms_simplify(usa, keep = .1)
usa10 = ms_simplify(usa, keep = .05)
usa5 = ms_simplify(usa, keep = .01) usa1
= st_transform(states, 5070)
states = ms_simplify(states, keep = .05)
simp_states plot(simp_states$geometry)
In all cases, the number of points in a geometry can be calculated with mapview::npts()
= st_transform(states, 5070)
states = st_simplify(states, dTolerance = 20000)
simp_states_st = ms_simplify(states, keep = .05)
simp_states_ms
::npts(states)
mapview#> [1] 5930
::npts(simp_states_st)
mapview#> [1] 50
::npts(simp_states_ms)
mapview#> [1] 292
Tesselation
So far we have spent the last few days looking at simple feature objects
Where a feature as a geometry define by a set of structured POINT(s)
These points have precision and define location ({X Y CRS})
The geometry defines a bounds: either 0D, 1D or 2D
Each object is therefore bounded to those geometries implying a level of exactness.
Core Concepts of Spatial Data: (Kuhn 2012)
Core Concepts of Spatial Data: (Kuhn 2012)
One Base Concept: Location
Four Content Concepts: Field, Object, Network, Event
Two Quality Concepts: Granularity, Accuracy
Core Concepts of Spatial Data: (Kuhn 2012)
One Base Concept: Location (coordinates)
Four Content Concepts: Field (raster), Object (simple feature),
Network,EventTwo Quality Concepts: Granularity (simplification), Accuracy (taken for granted)
Object View
Objects describe individuals that have an identity (id) as well as spatial, temporal, and thematic properties.
Answers questions about properties and relations of objects.
Results from fixing theme, controlling time, and measuring space.
Features, such as surfaces, depend on objects (but are also objects)
Object View
Object implies boundedness
- boundaries may not be known or even knowable, but have limits.
Crude examples of such limits are the minimal bounding boxes used for indexing and querying objects in databases.
Many objects (particularly natural ones) do not have crisp boundaries (watersheds)
Differences between spatial information from multiple sources are often caused by more or less arbitrary delineations through context-dependent boundaries.
Many questions about objects and features can be answered without boundaries, using simple point representations (centroids) with thematic attributes.
Field View
Fields describe phenomena that have a scalar or vector attribute everywhere in a space of interest
- for example, air temperatures, rainfall, elevation, land cover
Field information answers the question what is here?, where here can be anywhere in the space considered.
Field-based spatial information can also represent attributes that are computed rather than measured, such as probabilities or densities.
Together, fields and objects are the two fundamental ways of structuring spatial information.
Objects can provide coverage:
Both objects and fields can cover space continuously - the primary difference is that objects prescribe bounds.
Counties are one form of object that covers the USA seamlessly
State objects are another …
Object Coverages
LANDSAT Path Row
- Serves tiles based on a path/row index
#> Warning in layer_sf(geom = GeomSf, data = data, mapping = mapping, stat = stat,
#> : Ignoring unknown aesthetics: label
MODIS Sinisoial Grid
- Serves tiles based on a path/row index
#> Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
#> that
#> Warning in layer_sf(geom = GeomSf, data = data, mapping = mapping, stat = stat,
#> : Ignoring unknown aesthetics: label
Uber Hex Addressing
- Breaks the world into Hexagons…
what3word
- Breaks the world into 3m grids encoded with unique 3 word strings
Map Tiles / slippy maps / Pyramids
- Use XYZ where Z is a zoom level …
Our Data for today …
Southern Counties
= aoi_get(state = "south", county = "all") |>
south_counties st_transform(st_crs(cities))
Unioned to States using dplyr
= south_counties |>
south_states group_by(state_name) |>
summarise()
South County Centroids
= st_centroid(south_counties)
south_cent #> Warning: st_centroid assumes attributes are constant over geometries
Southern Coverage: County
Regular Tiles
One way to tile a surface is into regions of equal area
Tiles can be either square (rectilinear) or hexagonal
st_make_grid
generates a square or hexagonal grid covering the geometry of ansf
orsfc
objectThe return object of
st_make_grid
is a newsfc
objectGrids can be specified by cellsize or number of grid cells (n) in the X and Y direction
# Create a grid over the south with 70 rows and 50 columns
= st_make_grid(south_counties, n = c(70, 50)) |>
sq_grid st_as_sf() |>
mutate(id = 1:n())
Southern Coverage: Square
Hexagonal Grid
Hexagonal tessellations (honey combs) offer an alternative to square grids
They are created in the same way but by setting
square = FALSE
= st_make_grid(south_counties, n = c(70, 50), square = FALSE) |>
hex_grid st_as_sf() |>
mutate(id = 1:n())
Southern Coverage: Hexagon
Advantages Square Grids
Simple definition and data storage
- Only need the origin (lower left), cell size (XY) and grid dimensions
Easy to aggregate and dissaggregate (resample)
Analogous to raster data
Relationship between cells is given
Combining layers is easy with traditional matrix algebra
Advantages Square Grids
- Reduced Edge Effects
- Lower perimeter to area ratio
- minimizes the amount line length needed to create a lattice of cells with a given area
- All neighbors are identical
- No rook vs queen neighbors
- Better fit to curve surfaces (e.g. the earth)
Triangulations
An alternative to creating equal area tiles is to create triangulations from known anchor points
Triangulation requires a set of input points and seeks to partition the interior into a partition of triangles.
In GIS contexts you’ll hear:
- Thiessen Polygon
- Voronoi Regions
- Delunay Triangulation
- TIN (Triangular irregular networks)
- etc,..
Voronoi Polygons
Voronoi/Thiessen polygon boundaries define the area closest to each anchor point relative to all others
They are defined by the perpendicular bisectors of the lines between all points.
Voronoi Polygons
Voronoi Polygons
- Usefull for tasks such as:
- nearest neighbor search,
- facility location (optimization),
- largest empty areas,
- path planning…
- Also useful for simple interpolation of values such as rain gauges,
Often used in numerical models and simulations
st_voronoi
st_voronoi creates voronoi tesselation in
sf
It works over a
MULTIPOINT
collectionShould always be simplified after creation (st_cast())
If to be treated as an object for analysis, should be id’d
= st_union(south_cent)
south_cent_u
= st_voronoi(south_cent_u) |>
v_grid st_cast() |>
st_as_sf() |>
mutate(id = 1:n())
Southern Coverage: Voronoi
Southern Coverage: Voronoi
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
Delaunay triangulation
- A Delaunay triangulation for a given set of points (P) in a plane, is a triangulation DT(P), where no point is inside the circumcircle of any triangle in DT(P).
Delaunay triangulation
The Delaunay triangulation of a discrete POINT set corresponds to the dual graph of the Voronoi diagram.
The circumcenters (center of circles) of Delaunay triangles are the vertices of the Voronoi diagram.
Used in landscape evaluation and terrian modeling
st_triangulate
st_triangulate
creates Delaunay triangulation insf
It works over a
MULTIPOINT
collectionShould always be simplified after creation (
st_cast()
)If to be treated as an object for analysis, should be id’d
= st_triangulate(south_cent_u) |>
t_grid st_cast() |>
st_as_sf() |>
mutate(id = 1:n())
Southern Coverage: Triangles
Southern Coverage: Triangles
= st_intersection(t_grid, st_union(south_states))
t_grid #> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
Difference in object coverages:
Type | Elements | Mean Area (km2) | Standard Deviation Area (km2) | Coverage Area |
---|---|---|---|---|
triangulation | 2,828 | 832 | 557 | 2,352,288 |
voroni | 1,421 | 1,688 | 1,046 | 2,398,383 |
counties | 1,421 | 1,688 | 1,216 | 2,398,383 |
grid | 3,500 | 1,470 | 0 | 5,144,369 |
Hexagon | 3,789 | 1,425 | 0 | 5,398,416 |
Modifiable areal unit problem (MAUP)
The modifiable areal unit problem (MAUP) is a source of statistical bias that can significantly impact the results of statistical hypothesis tests.
MAUP affects results when point-based measures are aggregated into districts.
The resulting summary values (e.g., totals or proportions) are influenced by both the shape and scale of the aggregation unit.
Real World Example
The power of GIS is the ability to integrate different layers and types of information
The scale of information can impact the analysis as can the grouping and zoning schemes chosen
The Modifiable Areal Unit Problem (MAUP) is an important issue for those who conduct spatial analysis using units of analysis at aggregations higher than incident level.
The MAUP occurs when the aggregate units of analysis are arbitrarily produced or not directly related to the underlying phenomena. A classic example of this problem is Gerrymandering.
Gerrymandering involves shaping and re-shaping voting districts based on the political affiliations of the resident citizenry.
Examples
Wrap up
Today we covered the following topics: - Spatial predicates and binary predicates - Spatial filtering and joining - Spatial simplification - Spatial tessellations / Coverages - Modifiable areal unit problem (MAUP)
Combined with your understanding of
- Geometry and topology
- Coordinate Reference Systems
- Their integration with R
We are well suited to move on to raster data (gridded coverage model!) next week
Congrats!!