Predicates and the DE-9IM
Vector Data Lab: Due this week
Lab Next Week: Lightning Talks, In Person (efficivly your final lab assignment)
Final Report Due: May 14th, 2025 5PM
Credit Statement: Due with Final Report
No more late work accepted after May 9th, 2025 5PM
All regrades and late work will be updated by this Friday May 2nd, 2025 5PM
There are 2 labs (vector + presentations), the 20% project, and 10% extra credit still remaining.
Reminder: The extra credit is to compile all of your labs into you personal webpage (complete with image and description). This is due May 14th, 2025 5PM as well.
Last week we learned about simple features, CRS’s, and the sf
package.
We coverd how the sf package fits within the tidyverse data science framework, while adding spatial capabilities.
Some of the first spatial priciples we covered were measures (st_area, st_length, st_distance) and their relation geometry types (POINT, LINESTRING, POLYGON, etc.) and CRSs.
Today, we will cover the DE-9IM model and how it can be used to describe the spatial relationships between geometries…
A POINT is shape with a dimension of 0 that occupies a single location in coordinate space.
A LINESTRING is shape that has a dimension of 1 (length)
A POLYGON is surface stored as a list of its exterior and interior rings. It has a dimension of 2. (area)
All geometries have interior, boundary and exterior regions.
The terms interior
and boundary
are used in the context of algebraic topology and manifold theory and not general topology
The OGC has define these states for the common geometry types in the simple features standard:
Dimension | Geometry Type | Interior (I) | Boundary (B) |
---|---|---|---|
Point, MultiPoint | 0 | Point, Points | Empty |
LineString, Line | 1 | Points that are left when the boundary points are removed. | Two end points. |
Polygon | 2 | Points within the rings. | Set of rings. |
In the following, we are interested in the resulting geometry that occurs when 2 geometries are overlaid…
The DE-9IM is a topological model and (standard) used to describe the spatial relations of two geometries
Used in geometry, point-set topology, geospatial topology
The DE-9IM matrix provides a way to classify geometry relations using the set {0,1,2,F} or {T,F}
With a {T,F} matrix domain, there are 512 possible relations ….
About 10 of these, have been given a common name such as intersects, touches, and within.
When testing two geometries against a scheme, the result is a spatial predicate named by the scheme.
Provides the primary basis for queries and assertions in GIS and spatial databases (PostGIS).
The DE-9IM matrix is based on a 3x3 intersection matrix:
Where:
Empty sets are denoted as F
non-empty sets are denoted with the maximum dimension of the intersection {0,1,2}
A simpler (binary) version of this matrix can be created by mapping all non-empty intersections {0,1,2} to TRUE.
Where II
would state: “Does the Interior of”a” overlaps with the Interior of “b” in a way that produces a point (0), line (1), or polygon (2)”
Where IB
would state: “Does the Interior of”a” overlap with the Boundary of “b” in a way that produces a point (0), line (1), or polygon (2)”
Both matrix forms: - dimensional {0,1,2,F} - Boolean {T,F}
Can be serialize as a “DE-9IM string code” representing the matrix in a single string element (standardized format for data interchange)
The OGC has standardized the typical spatial predicates (Contains, Crosses, Intersects, Touches, etc.) as Boolean functions, and the DE-9IM model as a function that returns the DE-9IM code, with domain of {0,1,2,F}
Reading from left-to-right and top-to-bottom, the DE-9IM(a,b) string code is ‘212101212’
“named spatial predicates” have been defined for some common relations.
A few functions can be derived (expressed by masks) from DE-9IM include (* = wildcard):
Predicate | DE-9IM String Code | Description |
---|---|---|
Intersects | T*F**FFF* |
“Two geometries intersect if they share any portion of space” |
Overlaps | T*F**FFF* |
“Two geometries overlap if they share some but not all of the same space” |
Equals | T*F**FFF* |
“Two geometries are topologically equal if their interiors intersect and no part of the interior or boundary of one geometry intersects the exterior of the other” |
Disjoint | FF*FF***** |
“Two geometries are disjoint: they have no point in common. They form a set of disconnected geometries.” |
Touches | FT******* |
F**T***** |
Contains | T*****FF** |
“A contains B: geometry B lies in A, and the interiors intersect” |
Covers | T*****FF* |
*T****FF* |
Within | *T*****FF* |
**T****FF* |
Covered by | *T*****FF* |
**T****FF* |
Collectively, predicates define the type of relationship each 2D object has with another.
Of the ~ 512 unique relationships offered by the DE-9IM models a selection of ~ 10 have been named.
These are include in PostGIS/GEOS and are made accessible via R sf
sf
provides a set of functions that implement the DE-9IM model and named predicates. Each of these can return either a sparse or dense matrix.st_intersects(x, y, sparse = FALSE)
#> [,1] [,2] [,3] [,4]
#> [1,] TRUE FALSE TRUE FALSE
#> [2,] FALSE TRUE TRUE FALSE
#> [3,] FALSE FALSE TRUE FALSE
st_disjoint(x, y, sparse = FALSE)
#> [,1] [,2] [,3] [,4]
#> [1,] FALSE TRUE FALSE TRUE
#> [2,] TRUE FALSE FALSE TRUE
#> [3,] TRUE TRUE FALSE TRUE
st_touches(x, y, sparse = FALSE)
#> [,1] [,2] [,3] [,4]
#> [1,] FALSE FALSE FALSE FALSE
#> [2,] FALSE FALSE FALSE FALSE
#> [3,] FALSE FALSE FALSE FALSE
st_within(x, y, sparse = FALSE)
#> [,1] [,2] [,3] [,4]
#> [1,] FALSE FALSE FALSE FALSE
#> [2,] FALSE FALSE FALSE FALSE
#> [3,] FALSE FALSE FALSE FALSE
binary predicates return conditional {T,F} relations based on predefined masks of the DE-9IM strings
Up to this point in class we have been using Boolean {T,F} masks to filter data.frames by columns:
We can filter spatially, use st_filter
as the function call
Here the boolean
condition is not passed (e.g. name == WA)
But instead, is defined by a spatial predicate
The default is st_intersects
but can be changed with the .predicate
argument:
states = AOI::aoi_get(state = c("WA", "OR", "ID", "MT"))
wa <- filter(states, state_abbr == "WA")
mutate(states,
touch_s = st_touches(states, wa),
touch_d = st_touches(states, wa, sparse = FALSE)) |>
select(state_name, contains('touch'))
#> Simple feature collection with 4 features and 3 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -124.8485 ymin: 41.98818 xmax: -104.0397 ymax: 49.00244
#> Geodetic CRS: WGS 84
#> state_name touch_s touch_d geometry
#> 1 Washington FALSE MULTIPOLYGON (((-121.5237 4...
#> 2 Oregon 1 TRUE MULTIPOLYGON (((-117.22 44....
#> 3 Idaho 1 TRUE MULTIPOLYGON (((-111.0455 4...
#> 4 Montana FALSE MULTIPOLYGON (((-109.7985 4...
st_filter(states, wa, .predicate = st_touches)
#> Simple feature collection with 2 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -124.7035 ymin: 41.98818 xmax: -111.0435 ymax: 49.00085
#> Geodetic CRS: WGS 84
#> state_region state_division feature_code state_name state_abbr name
#> 1 4 9 1155107 Oregon OR Oregon
#> 2 4 8 1779783 Idaho ID Idaho
#> fip_class tiger_class combined_area_code metropolitan_area_code
#> 1 <NA> G4000 NA <NA>
#> 2 <NA> G4000 NA <NA>
#> functional_status land_area water_area fip_code
#> 1 A 248628414476 6170965739 41
#> 2 A 214049931578 2391569647 16
#> geometry
#> 1 MULTIPOLYGON (((-117.22 44....
#> 2 MULTIPOLYGON (((-111.0455 4...
Spatially joining data relies on shared geographic relations rather then a shared key
Like filter, these relations can be defined by a predicate
As with tabular data, mutating joins add data to the target object (x) from a source object (y).
In sf
st_join
provides this joining capacity
By default, st_join
performs a left join (Returns all records from x, and the matched records from y)
left = FALSE
.The default predicate for st_join
(and st_filter
) is st_intersects
This can be changed with the join argument (see ?st_join
for details).
What county has the most Starbucks in each state?
(starbucks = readr::read_csv('../labs/data/directory.csv') |>
filter(!is.na(Latitude), Country == "US") |>
st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326) |>
st_transform(5070))
#> Simple feature collection with 13608 features and 11 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -6223543 ymin: 278590.9 xmax: 2122577 ymax: 5307178
#> Projected CRS: NAD83 / Conus Albers
#> # A tibble: 13,608 × 12
#> Brand `Store Number` `Store Name` `Ownership Type` `Street Address` City
#> * <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Starbucks 3513-125945 Safeway-Anc… Licensed 5600 Debarr Rd … Anch…
#> 2 Starbucks 74352-84449 Safeway-Anc… Licensed 1725 Abbott Rd Anch…
#> 3 Starbucks 12449-152385 Safeway - A… Licensed 1501 Huffman Rd Anch…
#> 4 Starbucks 24936-233524 100th & C S… Company Owned 320 W. 100th Av… Anch…
#> 5 Starbucks 8973-85630 Old Seward … Company Owned 1005 E Dimond B… Anch…
#> 6 Starbucks 72788-84447 Fred Meyer … Licensed 1000 E Northern… Anch…
#> 7 Starbucks 79549-106150 Safeway - A… Licensed 3101 PENLAND PK… Anch…
#> 8 Starbucks 75988-107245 ANC Main Te… Licensed HMSHost, 500 We… Anch…
#> 9 Starbucks 11426-99254 Tudor Rd an… Company Owned 110 W. Tudor Rd… Anch…
#> 10 Starbucks 20349-108249 Fred Meyer-… Licensed 7701 Debarr Road Anch…
#> # ℹ 13,598 more rows
#> # ℹ 6 more variables: `State/Province` <chr>, Country <chr>, Postcode <chr>,
#> # `Phone Number` <chr>, Timezone <chr>, geometry <POINT [m]>
Clipping is a form of subsetting that involves changing the geometry of at least some features.
Clipping can only apply to features more complex than points: (lines, polygons and their ‘multi’ equivalents).
[,]
) implements st_intersection
Today we covered the following topics: - Spatial predicates and binary predicates - Spatial filtering and joining
Combined with your understanding of
We are well suited to move on to raster data (gridded model) next time
You have been tasked by the USACE of identify the urban population living along the Mississippi River system
You are interested in the county level since most flood control measures and flood response efforts are enforced by county EMAs
As such, you will need to identify the counties that intersect the Mississippi River system and then sum the population of the cities within those counties.
Your final results should be a map of the counties that intersect the Mississippi River system, colored by the total urban population in each county.
To do this …
AOI::aoi_get()
to get the county boundaries for all counties in CONUS.st_filter
to identify the counties that intersect the Mississippi River systemsf
object in the correct CRSst_join
to identify the cities that intersect the counties that intersect the Mississippi River system