Geography 13

class: center, middle, inverse, title-slide

# Geography 13
## Lecture 20: K-means Clustering
### Mike Johnson

---

# Raster Data Model

---

# Raster Data (Values)

```r
(v = c(1:8))
```

```
[1] 1 2 3 4 5 6 7 8
```

```r
dim(v) = c(2,4)
v
```

```
     [,1] [,2] [,3] [,4]
[1,]    1    3    5    7
[2,]    2    4    6    8
```

---

```r
dim(v) = c(2,2,2)
v
```

```
, , 1

[,1] [,2]
[1,]    1    3
[2,]    2    4

, , 2

[,1] [,2]
[1,]    5    7
[2,]    6    8
```

---

# Map Algebra: 2 maps

```r
AOI = USAboundaries::us_states() %>% filter(name == "California")
ca = getTerraClim(AOI, c("tmax", "srad"), startDate = "2018-08-01") 
```

```
Spherical geometry (s2) switched off
Spherical geometry (s2) switched on
```

```r
s = stack(ca) %>% setNames(c("tmax", "srad"))
plot(s)
```

---
# Local

```r
tmax = s$tmax / 10
plot(tmax)
```

<img src="lecture-20_files/figure-html/unnamed-chunk-8-1.png" width="432" style="display: block; margin: auto;" />
---

# Focal

```r
tmax_smooth = focal(tmax, w = matrix(1, 11,11), mean)
plot(tmax_smooth)
```

---
# Zonal

```r
tmax_agg = aggregate(tmax, 10)
plot(tmax_agg)
```

---
# Global

```r
cellStats(tmax, fivenum)
```

```
[1] 24.60276 28.95835 29.91061 31.33901 32.46972
```

---

# Reclassification: Local operation via global persepctive

```r
#global
f = cellStats(s$srad, fivenum)
# matrix
rcl = data.frame(f[1:4], f[2:5], 1:4)
#local
out = reclassify(s$srad, rcl,include.lowest = TRUE)
plot(out, col = RColorBrewer::brewer.pal(4, "Spectral"))
```

---

### Apply map algrbras over layers of a stack

```r
reclassQuarts = function(x){
  f = cellStats(x, fivenum)
  rcl = data.frame(f[1:4], f[2:5], 1:4)
  reclassify(x, rcl,include.lowest = TRUE)
}

out = stack(lapply(ca, reclassQuarts))
plot(stack(out), col = viridis::viridis(4))
```

---

### Naive Interaction between Classifications: Multiplicative

- Relationship between tmax and solar radiation

```r
out2 = out$terraclim_srad * out$terraclim_tmax
plot(out2, col = viridis::viridis(16))
```

---

### Naive Interaction between Classifications: Bivariate

- Geography 183 type cartography

```r
plot(out$terraclim_srad, col = viridis::viridis(4, alpha = .5), legend = FALSE)
plot(out$terraclim_tmax, col = viridis::inferno(4, alpha = .25), add = TRUE, legend = FALSE)
```

---
class: center, middle
# Moving beyond "additive" associations
---

# Clustering

- Clustering is a broad set of techniques for finding subgroups of observations within a data set.

- K-means
  - Decision-tree
  - Random Forest
  - Hierarchical

- When we cluster observations, we want to place similar observations in the same category, and dissimilar observations in others.

- The amount of variance allowed in each group depends on how many categories you select

- Clustering is an **unsupervised** method, which implies that it seeks to find relationships between "n" observations without being trained by a response variable or a "truth"

---

# K-means

K-means clustering is the simplest and the most commonly used clustering method for splitting a dataset into "k" groups.

---

# K-means

- Valuable, particularly for geodata

- Basis of clustering and a lot of "machine learning"

- Can be abused...

^this study is without a doubt morally questionable and even methodologically questionable as it supporting a positive feedback loop. It is also based on the idea that arrests are a good proxy for crime...

---

# Data Preparation

To perform a cluster analysis your data should be prepared as follows:

- **Rows** are observations and **columns** are variables (tidy data)
 
--

- *All* missing values (`NA`, `NULL`) must be removed or estimated.

- The data should be standardized (i.e., scaled) to make variables comparable. 
 
--

- standardization consists of transforming the variables such that they have mean zero and standard deviation one.

- In R, base `scale` does the following:

`$$\frac{x - mean(x)}{sd(x)}$$`
--

scaling makes sense when you have multiple variables at different magnitudes (e.g. temperature and PRCP).

---

# Toy Data

We will use the wine dataset from the UCI Machine Learning Repository.

The wine dataset contains the results of a chemical analysis of wines grown in a specific area of Italy. Three types of wine are represented in the 178 samples, with the results of 13 chemical analyses recorded for each sample.

```r
library(rattle.data)
data = dplyr::select(wine, "Alcohol", "Malic", "Ash")
dim(data)
```

```
[1] 178   3
```

```r
head(data)
```

```
  Alcohol Malic  Ash
1   14.23  1.71 2.43
2   13.20  1.78 2.14
3   13.16  2.36 2.67
4   14.37  1.95 2.50
5   13.24  2.59 2.87
6   14.20  1.76 2.45
```

---

# Preparing Data

- We should always ensure there is no missing data (`NA` values are "infectious")

```r
data <- na.omit(data)
```

- The data should be standardized so that the clustering algorithm is not influenced by an arbitrary variable unit

- In our data we see Alcohol is a magnitude larger then Malic and Ash

```r
(data <- scale(data))
```

```
        Alcohol       Malic         Ash
1    1.51434077 -0.56066822  0.23139979
2    0.24559683 -0.49800856 -0.82566722
3    0.19632522  0.02117152  1.10621386
4    1.68679140 -0.34583508  0.48655389
5    0.29486844  0.22705328  1.83522559
6    1.47738706 -0.51591132  0.30430096
7    1.71142720 -0.41744613  0.30430096
8    1.30493643 -0.16680747  0.88751034
9    2.25341491 -0.62332789 -0.71631546
10   1.05857838 -0.88291793 -0.35180959
11   1.35420804 -0.15785609 -0.24245783
12   1.37884384 -0.76654998 -0.16955666
13   0.92308146 -0.54276546  0.15849862
14   2.15487169 -0.54276546  0.08559744
15   1.69910930 -0.41744613  0.04914686
16   0.77526663 -0.47115441  1.21556562
17   1.60056608 -0.37268923  1.28846679
18   1.02162467 -0.68598755  0.92396093
19   1.46506916 -0.66808479  0.41365272
20   0.78758453  0.68357369  0.70525741
21   1.30493643 -0.63227927 -0.31535901
22  -0.08698653  1.31017034  1.03331269
23   0.87380985 -0.42639751 -0.02375431
24  -0.18552975 -0.65913341  0.55945507
25   0.61513390 -0.47115441  0.88751034
26   0.06082829 -0.25632128  3.11099611
27   0.47963697 -0.50695994  0.92396093
28   0.36877585 -0.55171684 -0.82566722
29   1.07089628 -0.39059199  1.58007149
30   1.25566482 -0.58752236 -0.57051311
31   0.89844565 -0.74864721  1.21556562
32   0.71367712 -0.60542512 -0.02375431
33   0.83685614 -0.45325165 -0.02375431
34   0.93539936 -0.72179307  1.21556562
35   0.62745180 -0.48010579  1.03331269
36   0.59049809 -0.47115441  0.15849862
37   0.34414005 -0.62332789  1.72587383
38   0.06082829 -0.61437650  0.66880683
39   0.08546410 -0.74864721 -0.97146956
40   1.50202286  1.48024658  0.52300448
41   0.68904131 -0.56066822 -0.20600725
42   0.50427278  1.34597587 -0.89856839
43   1.08321419 -0.39954337  0.81460917
44   0.29486844  1.47129519 -0.27890842
45   0.06082829 -0.50695994 -0.97146956
46   1.48970496  1.52500348  0.26785038
47   1.69910930  1.12219135 -0.31535901
48   1.10784999 -0.58752236 -0.89856839
49   1.35420804 -0.28317542  0.12204803
50   1.15712160 -0.54276546 -0.35180959
51   0.06082829 -0.54276546 -1.19017308
52   1.02162467 -0.61437650  0.85105976
53   1.00930677 -0.52486270  0.19494920
54   0.94771726 -0.39059199  1.14266445
55   0.91076355 -0.59647374 -0.42471076
56   0.68904131 -0.54276546  0.34075155
57   1.50202286 -0.56961960 -0.24245783
58   0.35645795 -0.32793232  1.14266445
59   0.88612775 -0.81130688  0.48655389
60  -0.77678907 -1.24992453 -3.66881295
61  -0.82606067 -1.10670244 -0.31535901
62  -0.44420570 -0.87396654 -1.26307425
63   0.82453824 -0.97243173 -1.62758012
64  -0.77678907 -1.07984830 -0.75276604
65  -1.02314711 -0.79340412  0.59590565
66  -0.77678907 -1.00823725  0.70525741
67   0.13473571 -1.18726487 -2.42949302
68  -0.77678907 -1.04404278 -1.62758012
69   0.41804746 -1.24992453 -0.02375431
70  -0.97387550 -1.02614002 -2.24724008
71  -0.87533228 -0.65018203 -0.57051311
72   1.05857838 -0.73969583  1.10621386
73   0.60281600 -0.60542512 -0.46116135
74  -0.01307912 -0.59647374  0.85105976
75  -1.28182306 -1.11565382 -0.24245783
76  -1.65136013 -0.40849475 -1.62758012
77   0.03619249 -1.28573006 -2.39304243
78  -1.42963789  0.49559470 -0.49761194
79  -0.82606067 -1.20516763 -1.51822836
80  -0.37029829  1.37283001  0.12204803
81  -1.23255145 -1.26782729 -1.33597542
82  -0.34566248 -0.47115441 -0.60696370
83  -1.13400823 -1.07984830  0.52300448
84   0.06082829  1.36387863 -0.16955666
85  -1.42963789 -1.29468144  0.77815859
86  -0.40725200 -1.21411901 -0.46116135
87  -1.03546501 -0.65018203 -0.20600725
88  -1.66367803 -0.59647374  0.92396093
89  -1.67599593 -0.24736990  0.34075155
90  -1.13400823 -0.90082069 -0.24245783
91  -1.13400823 -0.45325165 -0.16955666
92  -1.23255145 -0.73969583  0.19494920
93  -0.38261619 -0.72179307 -0.38826018
94  -0.87533228  0.44188642 -0.53406252
95  -1.70063174 -0.31002956 -0.31535901
96  -0.65361004 -0.73074445 -0.60696370
97  -1.46659160 -0.19366161  1.36136797
98  -0.87533228 -0.82920964 -1.40887660
99  -0.77678907 -1.13355658 -0.97146956
100 -0.87533228  0.74623336 -0.57051311
101 -1.13400823 -0.22946714 -2.42949302
102 -0.49347731 -0.89186931 -1.70048129
103 -0.81374277  0.10173395  0.34075155
104 -1.45427369 -0.55171684 -1.77338246
105 -0.60433843 -0.54276546 -1.40887660
106 -0.71519955  0.19124776 -0.35180959
107 -0.92460389 -0.54276546 -0.89856839
108 -0.34566248 -0.52486270 -0.31535901
109 -0.96155760 -0.93662621 -1.55467894
110 -1.71294964 -0.88291793  1.21556562
111 -1.89771818  1.25646206 -1.99208598
112 -0.59202053  0.08383119 -0.71631546
113 -1.52818111  0.30761571  2.01747852
114 -1.95930769 -1.42895215  0.48655389
115 -1.13400823 -0.84711240  0.48655389
116 -2.42738798 -0.73969583 -0.60696370
117 -1.45427369 -0.77550136 -1.37242601
118 -0.71519955 -0.65018203 -0.64341428
119 -0.28407297  0.97896926 -1.40887660
120 -1.23255145  0.97896926 -1.33597542
121 -1.91003608  0.05697705  0.19494920
122 -1.77453915 -0.25632128  3.14744670
123 -0.71519955  1.87410733  1.32491738
124  0.06082829  3.10044648 -0.86211780
125 -1.39268418  1.76669076  0.08559744
126 -1.14632613 -0.15785609 -0.71631546
127 -0.70288165 -0.72179307 -0.27890842
128 -1.49122740 -0.18471023  1.50717031
129 -0.77678907 -0.63227927 -0.24245783
130 -1.18327984  1.75773938  0.04914686
131 -0.17321185 -0.88291793 -0.16955666
132 -0.14857605  0.58510851  0.12204803
133 -0.23480136 -0.02358538  0.12204803
134 -0.37029829  1.08638583 -0.02375431
135 -0.60433843 -0.98138311 -0.42471076
136 -0.49347731  0.11068533 -0.60696370
137 -0.92460389  2.13369737  0.63235624
138 -0.57970263  2.84085644  0.99686210
139  0.60281600  1.12219135 -0.64341428
140 -0.19784766  0.55825437  0.88751034
141 -0.08698653  0.42398365  1.21556562
142  0.44268327  0.20019914 -0.06020490
143  0.63976970  0.74623336  1.28846679
144  0.76294873  2.33957912 -0.06020490
145 -0.92460389  1.38178139 -0.60696370
146  0.19632522  1.10428859 -0.78921663
147  1.08321419  2.42014155 -0.49761194
148 -0.16089395  2.03523218  0.41365272
149  0.39341166  0.80889302  0.04914686
150  0.09778200  1.39968415 -0.02375431
151  0.61513390  0.70147646  0.92396093
152 -0.25943717  0.29866433  0.41365272
153  0.13473571 -0.39059199  1.39781855
154  0.28255053  0.86260131 -0.31535901
155 -0.51811312 -0.93662621 -0.97146956
156  0.20864312  2.55441226 -0.16955666
157  1.03394258  1.59661452  0.04914686
158 -0.67824585  0.62091403  0.99686210
159  1.64983769 -0.58752236  1.21556562
160  0.59049809 -0.59647374  0.99686210
161 -0.78910697  1.33702448  0.04914686
162  0.84917404  0.82679579  0.63235624
163 -0.18552975  0.83574717  0.77815859
164 -0.05003283  0.99687202 -0.06020490
165  0.96003516  0.37922675 -0.24245783
166  0.89844565  1.81144766 -0.38826018
167  0.55354439  1.22065654  0.85105976
168 -0.22248346  0.92526097 -0.24245783
169  0.71367712  0.21810190  1.17911504
170  0.49195487  2.02628080  1.79877500
171 -0.98619340  0.62091403 -0.16955666
172 -0.28407297  0.04802567 -0.31535901
173  1.42811545  0.15544223  0.41365272
174  0.87380985  2.96617577  0.30430096
175  0.49195487  1.40863553  0.41365272
176  0.33182214  1.73983662 -0.38826018
177  0.20864312  0.22705328  0.01269627
178  1.39116174  1.57871176  1.36136797
attr(,"scaled:center")
  Alcohol     Malic       Ash 
13.000618  2.336348  2.366517 
attr(,"scaled:scale")
  Alcohol     Malic       Ash 
0.8118265 1.1171461 0.2743440 
```

---

# Distance Measures

- The classification of "n" observations requires some method for computing the distance between each pair of observations.

- The result of this distance calculation is known as a  `distance` matrix.

- There are **many** methods to calculate distance and the choice is a critical step in clustering.

---

The classical methods for distance measures are Euclidean and Manhattan distances defined as follow:

$$ euclid = \sqrt{\sum(x_i - y_i)^2} $$
--

$$ man = \sum |x_i - y_i| $$

where `$x$` and `$y$` are two vectors of length `$n$`

---

- For most common clustering software, the default distance measure is the `Euclidean` method

- Depending on the type of the data and the research questions, other dissimilarity measures might be preferred and you should be aware of the options.

- the `dist` function in the `stats` package is able to compute the distance matrix

- The default method is 'euclidean' with support for  "*euclidean*", "*maximum*", "*manhattan*", "*canberra*", "*binary*" or "*minkowski*" calculations

- Within R it is simple to compute and visualize the distance matrix using the functions `get_dist` and `fviz_dist` from the `factoextra` package.

```r
dist <- dist(data)
length(dist)
```

```
[1] 15753
```

---

- Visualization tools, like those found in `factoextra` can help visualize a distance matrix

```r
library(factoextra)
fviz_dist(dist)
```

- This visualization begins to illustrate which wines have large dissimilarities (blue) and those that are fairly similar (red).

- The red diagonal across the middle is the comparison of each wine to itself, thus the perfect similarity!

---

# Categorizing your distance matrix

- K-means clustering is the most common unsupervised machine learning algorithm for partitioning a given data set into a set of k groups (i.e. k clusters)

- k represents the number of groups pre-specified by the analyst.

- It classifies objects in multiple groups (i.e., clusters), such that objects within the same cluster are as similar as possible (i.e., high intra-class similarity), and objects from different clusters are as dissimilar as possible (i.e., low inter-class similarity).

- In k-means clustering, each cluster is represented by its `center` (i.e, centroid) which corresponds to the `mean` of values assigned to the cluster.

---

---

# Determining Cluster Variation

- k-means clustering defines clusters so that the total `intra-cluster` variation (total within-cluster variation) is `minimized.`

- There are several k-means algorithms available.

- The standard algorithm is the Hartigan-Wong algorithm (1979), which defines the total within-cluster variation as the sum of squared Euclidean distances between items and the corresponding centroid:

$$ W(C_k) = \sum(x_i - \mu_k)^2$$

where:

`$x_i$` is a data point belonging to the cluster `$C_k$`

`$\mu_k$` is the mean value of the points assigned to the cluster  `$C_k$`

---

Each observation ( `$x_i$` ) is assigned to a given cluster ( `$k$` ) such that the sum of squares (SS) distance of the observation to their assigned cluster centers ( `$\mu_k$` ) is minimized.

The **total** within-cluster variation of the clustering is defined as:

$$ total.withiness = \sum_{k=1}^k W(C_k)$$
--

- The _total_,

_within-cluster_,

_sum of square_, measures the compactness of the clustering

- We want it to be as **small** as possible.

---

# Implementing the Algorithm

K-means algorithm can be summarized as follows:

- Specify the number of clusters (k) to be created
 - User choice .. say we pick 5
 
<img src="lec-img/20-kmeans-1.png" width="311" style="display: block; margin: auto;" />
 
 
---

- Select randomly k objects from the data set as the initial cluster `centers` or `means`
 
<img src="lec-img/20-kmeans-2.png" width="311" style="display: block; margin: auto;" />

---
 - Assigns each observation to their closest centroid, based on the Euclidean distance between the object and the centroid
 
<img src="lec-img/20-kmeans-3.png" width="311" style="display: block; margin: auto;" />

---

- For each of the `k` clusters, the cluster centroid is updated by calculating the new mean of all the data points assigned to that cluster.

---

- Iteratively _minimize_ the total within sum of square. That is, iterate the assignment and centroid updating until the cluster assignments stop changing - or -  the maximum number of iterations is reached.

- Repeat until no observations change cluster

<img src="lec-img/20-kmeans-5.png" width="311" style="display: block; margin: auto;" />
---

# Making 2 clusters!

```r
set.seed(09032020) # random process becomes repeatable
k2 <- kmeans(data, centers = 2)
str(k2)
```

```
List of 9
 $ cluster     : Named int [1:178] 1 2 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "names")= chr [1:178] "1" "2" "3" "4" ...
 $ centers     : num [1:2, 1:3] 0.576 -0.849 0.363 -0.535 0.441 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:2] "1" "2"
  .. ..$ : chr [1:3] "Alcohol" "Malic" "Ash"
 $ totss       : num 531
 $ withinss    : num [1:2] 238 120
 $ tot.withinss: num 358
 $ betweenss   : num 173
 $ size        : int [1:2] 106 72
 $ iter        : int 1
 $ ifault      : int 0
 - attr(*, "class")= chr "kmeans"
```

---

```r
names(k2)
```

```
[1] "cluster"      "centers"      "totss"       
[4] "withinss"     "tot.withinss" "betweenss"   
[7] "size"         "iter"         "ifault"      
```

The output of kmeans is a _list_ with several elements. The most important being:

**cluster**: A vector of integers (from 1:k) indicating the cluster to which each point is allocated.

**centers**: A matrix of cluster centers.

**totss**: The total sum of squares.

**withinss**: Vector of within-cluster sum of squares, one component per cluster.

**tot.withinss**: Total within-cluster sum of squares, i.e. sum(withinss).

**betweenss**: The between-cluster sum of squares, i.e. `$totss-tot.withinss$`.

**size**: The number of points in each cluster.

---

# Visualize

`factoextra` provide a nice wrapper from visualizing clustered data:

If there are more than two dimensions (variables), `fviz_cluster`  performs a principal component analysis (PCA) and plots the data points according to the two principal components that explain the majority of the variance.

```r
fviz_cluster(k2, data = data)
```

<img src="lecture-20_files/figure-html/unnamed-chunk-30-1.png" width="432" style="display: block; margin: auto;" />
---

# Testing other K values

```r
k3 <- kmeans(data, centers = 3, nstart = 25)
k4 <- kmeans(data, centers = 4, nstart = 25)
k5 <- kmeans(data, centers = 5, nstart = 25)

# plots to compare
p1 <- fviz_cluster(k2, geom = "point",  data = data) + ggtitle("k = 2")
p2 <- fviz_cluster(k3, geom = "point",  data = data) + ggtitle("k = 3")
p3 <- fviz_cluster(k4, geom = "point",  data = data) + ggtitle("k = 4")
p4 <- fviz_cluster(k5, geom = "point",  data = data) + ggtitle("k = 5")
```
---

```r
library(patchwork)
p1+p2+p3+p4
```

---

# Application to Raster Data!

- each layer of a RasterStack is a layer
- each layer is a vector of values

```r
library(climateR)
params = c("prcp", "tmax", "tmin", "srad", "q")
AOI = USAboundaries::us_states() %>% filter(name == "Colorado")
co = climateR::getTerraClim(AOI, params, startDate = "2018-10-01")  %>% 
  stack() %>% 
  setNames(params)
```

```
Spherical geometry (s2) switched off
Spherical geometry (s2) switched on
```

---
# Colorado October 2018 climate

```r
plot(co)
```

---
# Raster layers are vectors!

```r
values = values(co)
head(values)
```

```
     prcp tmax     tmin srad   q
[1,]  3.8  0.2 220.9327 22.8 3.6
[2,]  4.1  0.2 219.4443 21.5 3.2
[3,]  4.0  0.2 219.6443 22.0 3.3
[4,]  3.7  0.2 220.3325 23.3 3.7
[5,]  3.7  0.2 220.8286 23.5 3.8
[6,]  3.5  0.2 221.2287 24.2 4.1
```
---

# Data Prep

- Identify NA indices for latter reference
- Remove NA values 
- Scale

```r
idx = which(!is.na(values))
v = na.omit(values)
vs = scale(v)
```

---

```r
E = kmeans(vs, 5, iter.max = 100)

E
```

```
K-means clustering with 5 clusters of sizes 4078, 6252, 1043, 3439, 1750

Cluster means:
        prcp       tmax        tmin       srad          q
1 -0.4551978 -0.4524243 -0.25038134 -1.2604347 -1.2391167
2 -0.7072649 -0.7076268  0.61797764  0.5073093  0.2689811
3  2.3223271  2.3210312 -0.01228113 -1.3270937 -1.1748720
4  0.7730360  0.7680146 -1.26771914  0.7374313  1.1140070
5  0.6842624  0.6897322  0.87425957  0.4665642  0.4375826

Clustering vector:
   [1] 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
  [28] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1
  [55] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [82] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2
 [109] 2 2 2 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [136] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [163] 4 4 4 4 4 4 4 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [190] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1
 [217] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [244] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2
 [271] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4
 [298] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [325] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2 2 2 2 1 2 2 2 2 2 2 2 2
 [352] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [379] 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [406] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2
 [433] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [460] 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [487] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2 2 2 2 2 2
 [514] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [541] 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [568] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [595] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [622] 2 2 2 2 2 2 2 2 2 2 4 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [649] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [676] 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [703] 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [730] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [757] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [784] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4
 [811] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [838] 4 4 4 4 4 4 4 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [865] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1
 [892] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [919] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 [946] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [973] 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[1000] 4
 [ reached getOption("max.print") -- omitted 15562 entries ]

Within cluster sum of squares by cluster:
[1] 4323.862 8554.143 2794.234 5181.992 2593.711
 (between_SS / total_SS =  71.7 %)

Available components:

[1] "cluster"      "centers"      "totss"       
[4] "withinss"     "tot.withinss" "betweenss"   
[7] "size"         "iter"         "ifault"      
```

---

### Copying a raster structure

```r
clus_raster = co$tmax
values(clus_raster) = NA
```

--
### Assign values

```r
clus_raster[idx] <- E$cluster
plot(clus_raster, col = RColorBrewer::brewer.pal(5, "Spectral"))
```

---

# Merging across data sources

```r
# Get elevations data
elev = elevatr::get_elev_raster(AOI, z = 5) %>% 
  crop(AOI)

# Align Raster extents and resolutions
elev = projectRaster(elev, co$prcp)

# Extract Values
values = stack(co$prcp, elev) %>% values()

# Prep data
idx = which(!is.na(values))
v = na.omit(values)
vs = scale(v)

# Cluster
E = kmeans(vs, 5, iter.max = 100)
clus_raster = elev[[1]]
values(clus_raster) = NA
clus_raster[idx] <- E$cluster
```

---

.pull-left[

```r
par(mfrow = c(2,1))
plot(elev)
plot(co$prcp)
```

<img src="lecture-20_files/figure-html/unnamed-chunk-41-1.png" width="432" style="display: block; margin: auto;" />
]

.pull-right[

```r
plot(clus_raster, col = RColorBrewer::brewer.pal(5, "Spectral"))
```

<img src="lecture-20_files/figure-html/unnamed-chunk-42-1.png" width="432" style="display: block; margin: auto;" />
]

---

# Clustering at larger scales

```r
counties =  filter(USAboundaries::us_counties(), !state_abbr %in% c("AK", "HI", "PR"))

params = c("tmax", "tmin", "prcp", "srad")

s = getTerraClim(counties, params, startDate = "2018-06-01") %>% 
  stack() %>% 
  setNames(params)
```

<img src="lec-img/20-raster-save.png" width="342" style="display: block; margin: auto;" />
---

```r
dat = exactextractr::exact_extract(s, counties, "mean", progress = FALSE)
```

```r
dat = scale(dat)

counties$clust8 = kmeans(dat, 8)$cluster

plot(st_transform(counties["clust8"], 5070), border = NA)
```

---

# Daily Assignment

In a new Rmd...

- Install `climateR` (remotes::install_github("mikejohnson51/climateR"))

- Pick 2-4 climate variables for a state of your choice for one month of your choice (winter,summer, spring, etc)

- Pick your climate variables based on the state and month you choose
  - You can find available climate variables using (`param_meta$terraclim`) (use the `call` column)

```r
param_meta$terraclim
```

```
     common.name call
1            aet  aet
2  water_deficit  def
3         palmer PDSI
4            pet  pet
5           prcp  ppt
6              q    q
7          soilm soil
8           srad srad
9            swe  swe
10          tmax tmax
11          tmin tmin
12            vp  vap
13           vpd  vpd
14          wind   ws
                                        description units
1                         Actual Evapotranspiration    mm
2                            Climatic Water Deficit    mm
3                     Palmer Drought Severity Index      
4                      Reference Evapotranspiration    mm
5                         Accumulated Precipitation    mm
6                                            Runoff    mm
7                     Soil Moisture at End of Month    mm
8  Downward Shortwave Radiation Flux at the Surface  W/m2
9             Snow Water Equivalent at End of Month    mm
10                          Maximum 2-m Temperature     C
11                          Minimum 2-m Temperature     C
12                               2-m Vapor Pressure   kPa
13                      Mean Vapor Pressure Deficit   kPa
14                               Wind Speed at 10-m   m/s
```
  
---

- Implement a k-means classification for your state and your variables.

- Plot your final image using a nice color palette

---

# Submission

Submit your knit document to Gauchospace