class: center, middle, inverse, title-slide # Geography 13 ## Lecture 20: K-means Clustering ### Mike Johnson --- <style type="text/css"> .remark-code{line-height: 2; font-size: 80%} </style> # Raster Data Model <img src="lec-img/18-raster.png" width="474" style="display: block; margin: auto;" /> --- # Raster Data (Values) ```r (v = c(1:8)) ``` ``` [1] 1 2 3 4 5 6 7 8 ``` -- ```r dim(v) = c(2,4) v ``` ``` [,1] [,2] [,3] [,4] [1,] 1 3 5 7 [2,] 2 4 6 8 ``` --- ```r dim(v) = c(2,2,2) v ``` ``` , , 1 [,1] [,2] [1,] 1 3 [2,] 2 4 , , 2 [,1] [,2] [1,] 5 7 [2,] 6 8 ``` --- # Map Algebra: 2 maps ```r AOI = USAboundaries::us_states() %>% filter(name == "California") ca = getTerraClim(AOI, c("tmax", "srad"), startDate = "2018-08-01") ``` ``` Spherical geometry (s2) switched off Spherical geometry (s2) switched on ``` ```r s = stack(ca) %>% setNames(c("tmax", "srad")) plot(s) ``` <img src="lecture-20_files/figure-html/unnamed-chunk-7-1.png" width="432" style="display: block; margin: auto;" /> --- # Local ```r tmax = s$tmax / 10 plot(tmax) ``` <img src="lecture-20_files/figure-html/unnamed-chunk-8-1.png" width="432" style="display: block; margin: auto;" /> --- # Focal ```r tmax_smooth = focal(tmax, w = matrix(1, 11,11), mean) plot(tmax_smooth) ``` <img src="lecture-20_files/figure-html/unnamed-chunk-9-1.png" width="432" style="display: block; margin: auto;" /> --- # Zonal ```r tmax_agg = aggregate(tmax, 10) plot(tmax_agg) ``` <img src="lecture-20_files/figure-html/unnamed-chunk-10-1.png" width="432" style="display: block; margin: auto;" /> --- # Global ```r cellStats(tmax, fivenum) ``` ``` [1] 24.60276 28.95835 29.91061 31.33901 32.46972 ``` --- # Reclassification: Local operation via global persepctive ```r #global f = cellStats(s$srad, fivenum) # matrix rcl = data.frame(f[1:4], f[2:5], 1:4) #local out = reclassify(s$srad, rcl,include.lowest = TRUE) plot(out, col = RColorBrewer::brewer.pal(4, "Spectral")) ``` <img src="lecture-20_files/figure-html/unnamed-chunk-12-1.png" width="432" style="display: block; margin: auto;" /> --- ### Apply map algrbras over layers of a stack ```r reclassQuarts = function(x){ f = cellStats(x, fivenum) rcl = data.frame(f[1:4], f[2:5], 1:4) reclassify(x, rcl,include.lowest = TRUE) } out = stack(lapply(ca, reclassQuarts)) plot(stack(out), col = viridis::viridis(4)) ``` <img src="lecture-20_files/figure-html/unnamed-chunk-13-1.png" width="432" style="display: block; margin: auto;" /> --- ### Naive Interaction between Classifications: Multiplicative - Relationship between tmax and solar radiation ```r out2 = out$terraclim_srad * out$terraclim_tmax plot(out2, col = viridis::viridis(16)) ``` <img src="lecture-20_files/figure-html/unnamed-chunk-14-1.png" width="432" style="display: block; margin: auto;" /> --- ### Naive Interaction between Classifications: Bivariate - Geography 183 type cartography ```r plot(out$terraclim_srad, col = viridis::viridis(4, alpha = .5), legend = FALSE) plot(out$terraclim_tmax, col = viridis::inferno(4, alpha = .25), add = TRUE, legend = FALSE) ``` <img src="lecture-20_files/figure-html/unnamed-chunk-15-1.png" width="432" style="display: block; margin: auto;" /> --- class: center, middle # Moving beyond "additive" associations --- # Clustering - Clustering is a broad set of techniques for finding subgroups of observations within a data set. - K-means - Decision-tree - Random Forest - Hierarchical -- - When we cluster observations, we want to place similar observations in the same category, and dissimilar observations in others. -- - The amount of variance allowed in each group depends on how many categories you select -- - Clustering is an **unsupervised** method, which implies that it seeks to find relationships between "n" observations without being trained by a response variable or a "truth" --- # K-means K-means clustering is the simplest and the most commonly used clustering method for splitting a dataset into "k" groups. --- # K-means - Valuable, particularly for geodata -- - Basis of clustering and a lot of "machine learning" -- - Can be abused... -- <img src="lec-img/20-ml-really.png" width="432" style="display: block; margin: auto;" /> ^this study is without a doubt morally questionable and even methodologically questionable as it supporting a positive feedback loop. It is also based on the idea that arrests are a good proxy for crime... --- # Data Preparation To perform a cluster analysis your data should be prepared as follows: - **Rows** are observations and **columns** are variables (tidy data) -- - *All* missing values (`NA`, `NULL`) must be removed or estimated. -- - The data should be standardized (i.e., scaled) to make variables comparable. -- - standardization consists of transforming the variables such that they have mean zero and standard deviation one. -- - In R, base `scale` does the following: `$$\frac{x - mean(x)}{sd(x)}$$` -- scaling makes sense when you have multiple variables at different magnitudes (e.g. temperature and PRCP). --- # Toy Data We will use the wine dataset from the UCI Machine Learning Repository. The wine dataset contains the results of a chemical analysis of wines grown in a specific area of Italy. Three types of wine are represented in the 178 samples, with the results of 13 chemical analyses recorded for each sample. ```r library(rattle.data) data = dplyr::select(wine, "Alcohol", "Malic", "Ash") dim(data) ``` ``` [1] 178 3 ``` ```r head(data) ``` ``` Alcohol Malic Ash 1 14.23 1.71 2.43 2 13.20 1.78 2.14 3 13.16 2.36 2.67 4 14.37 1.95 2.50 5 13.24 2.59 2.87 6 14.20 1.76 2.45 ``` --- # Preparing Data - We should always ensure there is no missing data (`NA` values are "infectious") ```r data <- na.omit(data) ``` -- - The data should be standardized so that the clustering algorithm is not influenced by an arbitrary variable unit -- - In our data we see Alcohol is a magnitude larger then Malic and Ash -- ```r (data <- scale(data)) ``` ``` Alcohol Malic Ash 1 1.51434077 -0.56066822 0.23139979 2 0.24559683 -0.49800856 -0.82566722 3 0.19632522 0.02117152 1.10621386 4 1.68679140 -0.34583508 0.48655389 5 0.29486844 0.22705328 1.83522559 6 1.47738706 -0.51591132 0.30430096 7 1.71142720 -0.41744613 0.30430096 8 1.30493643 -0.16680747 0.88751034 9 2.25341491 -0.62332789 -0.71631546 10 1.05857838 -0.88291793 -0.35180959 11 1.35420804 -0.15785609 -0.24245783 12 1.37884384 -0.76654998 -0.16955666 13 0.92308146 -0.54276546 0.15849862 14 2.15487169 -0.54276546 0.08559744 15 1.69910930 -0.41744613 0.04914686 16 0.77526663 -0.47115441 1.21556562 17 1.60056608 -0.37268923 1.28846679 18 1.02162467 -0.68598755 0.92396093 19 1.46506916 -0.66808479 0.41365272 20 0.78758453 0.68357369 0.70525741 21 1.30493643 -0.63227927 -0.31535901 22 -0.08698653 1.31017034 1.03331269 23 0.87380985 -0.42639751 -0.02375431 24 -0.18552975 -0.65913341 0.55945507 25 0.61513390 -0.47115441 0.88751034 26 0.06082829 -0.25632128 3.11099611 27 0.47963697 -0.50695994 0.92396093 28 0.36877585 -0.55171684 -0.82566722 29 1.07089628 -0.39059199 1.58007149 30 1.25566482 -0.58752236 -0.57051311 31 0.89844565 -0.74864721 1.21556562 32 0.71367712 -0.60542512 -0.02375431 33 0.83685614 -0.45325165 -0.02375431 34 0.93539936 -0.72179307 1.21556562 35 0.62745180 -0.48010579 1.03331269 36 0.59049809 -0.47115441 0.15849862 37 0.34414005 -0.62332789 1.72587383 38 0.06082829 -0.61437650 0.66880683 39 0.08546410 -0.74864721 -0.97146956 40 1.50202286 1.48024658 0.52300448 41 0.68904131 -0.56066822 -0.20600725 42 0.50427278 1.34597587 -0.89856839 43 1.08321419 -0.39954337 0.81460917 44 0.29486844 1.47129519 -0.27890842 45 0.06082829 -0.50695994 -0.97146956 46 1.48970496 1.52500348 0.26785038 47 1.69910930 1.12219135 -0.31535901 48 1.10784999 -0.58752236 -0.89856839 49 1.35420804 -0.28317542 0.12204803 50 1.15712160 -0.54276546 -0.35180959 51 0.06082829 -0.54276546 -1.19017308 52 1.02162467 -0.61437650 0.85105976 53 1.00930677 -0.52486270 0.19494920 54 0.94771726 -0.39059199 1.14266445 55 0.91076355 -0.59647374 -0.42471076 56 0.68904131 -0.54276546 0.34075155 57 1.50202286 -0.56961960 -0.24245783 58 0.35645795 -0.32793232 1.14266445 59 0.88612775 -0.81130688 0.48655389 60 -0.77678907 -1.24992453 -3.66881295 61 -0.82606067 -1.10670244 -0.31535901 62 -0.44420570 -0.87396654 -1.26307425 63 0.82453824 -0.97243173 -1.62758012 64 -0.77678907 -1.07984830 -0.75276604 65 -1.02314711 -0.79340412 0.59590565 66 -0.77678907 -1.00823725 0.70525741 67 0.13473571 -1.18726487 -2.42949302 68 -0.77678907 -1.04404278 -1.62758012 69 0.41804746 -1.24992453 -0.02375431 70 -0.97387550 -1.02614002 -2.24724008 71 -0.87533228 -0.65018203 -0.57051311 72 1.05857838 -0.73969583 1.10621386 73 0.60281600 -0.60542512 -0.46116135 74 -0.01307912 -0.59647374 0.85105976 75 -1.28182306 -1.11565382 -0.24245783 76 -1.65136013 -0.40849475 -1.62758012 77 0.03619249 -1.28573006 -2.39304243 78 -1.42963789 0.49559470 -0.49761194 79 -0.82606067 -1.20516763 -1.51822836 80 -0.37029829 1.37283001 0.12204803 81 -1.23255145 -1.26782729 -1.33597542 82 -0.34566248 -0.47115441 -0.60696370 83 -1.13400823 -1.07984830 0.52300448 84 0.06082829 1.36387863 -0.16955666 85 -1.42963789 -1.29468144 0.77815859 86 -0.40725200 -1.21411901 -0.46116135 87 -1.03546501 -0.65018203 -0.20600725 88 -1.66367803 -0.59647374 0.92396093 89 -1.67599593 -0.24736990 0.34075155 90 -1.13400823 -0.90082069 -0.24245783 91 -1.13400823 -0.45325165 -0.16955666 92 -1.23255145 -0.73969583 0.19494920 93 -0.38261619 -0.72179307 -0.38826018 94 -0.87533228 0.44188642 -0.53406252 95 -1.70063174 -0.31002956 -0.31535901 96 -0.65361004 -0.73074445 -0.60696370 97 -1.46659160 -0.19366161 1.36136797 98 -0.87533228 -0.82920964 -1.40887660 99 -0.77678907 -1.13355658 -0.97146956 100 -0.87533228 0.74623336 -0.57051311 101 -1.13400823 -0.22946714 -2.42949302 102 -0.49347731 -0.89186931 -1.70048129 103 -0.81374277 0.10173395 0.34075155 104 -1.45427369 -0.55171684 -1.77338246 105 -0.60433843 -0.54276546 -1.40887660 106 -0.71519955 0.19124776 -0.35180959 107 -0.92460389 -0.54276546 -0.89856839 108 -0.34566248 -0.52486270 -0.31535901 109 -0.96155760 -0.93662621 -1.55467894 110 -1.71294964 -0.88291793 1.21556562 111 -1.89771818 1.25646206 -1.99208598 112 -0.59202053 0.08383119 -0.71631546 113 -1.52818111 0.30761571 2.01747852 114 -1.95930769 -1.42895215 0.48655389 115 -1.13400823 -0.84711240 0.48655389 116 -2.42738798 -0.73969583 -0.60696370 117 -1.45427369 -0.77550136 -1.37242601 118 -0.71519955 -0.65018203 -0.64341428 119 -0.28407297 0.97896926 -1.40887660 120 -1.23255145 0.97896926 -1.33597542 121 -1.91003608 0.05697705 0.19494920 122 -1.77453915 -0.25632128 3.14744670 123 -0.71519955 1.87410733 1.32491738 124 0.06082829 3.10044648 -0.86211780 125 -1.39268418 1.76669076 0.08559744 126 -1.14632613 -0.15785609 -0.71631546 127 -0.70288165 -0.72179307 -0.27890842 128 -1.49122740 -0.18471023 1.50717031 129 -0.77678907 -0.63227927 -0.24245783 130 -1.18327984 1.75773938 0.04914686 131 -0.17321185 -0.88291793 -0.16955666 132 -0.14857605 0.58510851 0.12204803 133 -0.23480136 -0.02358538 0.12204803 134 -0.37029829 1.08638583 -0.02375431 135 -0.60433843 -0.98138311 -0.42471076 136 -0.49347731 0.11068533 -0.60696370 137 -0.92460389 2.13369737 0.63235624 138 -0.57970263 2.84085644 0.99686210 139 0.60281600 1.12219135 -0.64341428 140 -0.19784766 0.55825437 0.88751034 141 -0.08698653 0.42398365 1.21556562 142 0.44268327 0.20019914 -0.06020490 143 0.63976970 0.74623336 1.28846679 144 0.76294873 2.33957912 -0.06020490 145 -0.92460389 1.38178139 -0.60696370 146 0.19632522 1.10428859 -0.78921663 147 1.08321419 2.42014155 -0.49761194 148 -0.16089395 2.03523218 0.41365272 149 0.39341166 0.80889302 0.04914686 150 0.09778200 1.39968415 -0.02375431 151 0.61513390 0.70147646 0.92396093 152 -0.25943717 0.29866433 0.41365272 153 0.13473571 -0.39059199 1.39781855 154 0.28255053 0.86260131 -0.31535901 155 -0.51811312 -0.93662621 -0.97146956 156 0.20864312 2.55441226 -0.16955666 157 1.03394258 1.59661452 0.04914686 158 -0.67824585 0.62091403 0.99686210 159 1.64983769 -0.58752236 1.21556562 160 0.59049809 -0.59647374 0.99686210 161 -0.78910697 1.33702448 0.04914686 162 0.84917404 0.82679579 0.63235624 163 -0.18552975 0.83574717 0.77815859 164 -0.05003283 0.99687202 -0.06020490 165 0.96003516 0.37922675 -0.24245783 166 0.89844565 1.81144766 -0.38826018 167 0.55354439 1.22065654 0.85105976 168 -0.22248346 0.92526097 -0.24245783 169 0.71367712 0.21810190 1.17911504 170 0.49195487 2.02628080 1.79877500 171 -0.98619340 0.62091403 -0.16955666 172 -0.28407297 0.04802567 -0.31535901 173 1.42811545 0.15544223 0.41365272 174 0.87380985 2.96617577 0.30430096 175 0.49195487 1.40863553 0.41365272 176 0.33182214 1.73983662 -0.38826018 177 0.20864312 0.22705328 0.01269627 178 1.39116174 1.57871176 1.36136797 attr(,"scaled:center") Alcohol Malic Ash 13.000618 2.336348 2.366517 attr(,"scaled:scale") Alcohol Malic Ash 0.8118265 1.1171461 0.2743440 ``` --- # Distance Measures - The classification of "n" observations requires some method for computing the distance between each pair of observations. -- - The result of this distance calculation is known as a `distance` matrix. -- - There are **many** methods to calculate distance and the choice is a critical step in clustering. --- The classical methods for distance measures are Euclidean and Manhattan distances defined as follow: $$ euclid = \sqrt{\sum(x_i - y_i)^2} $$ -- $$ man = \sum |x_i - y_i| $$ where `\(x\)` and `\(y\)` are two vectors of length `\(n\)` --- - For most common clustering software, the default distance measure is the `Euclidean` method -- - Depending on the type of the data and the research questions, other dissimilarity measures might be preferred and you should be aware of the options. -- - the `dist` function in the `stats` package is able to compute the distance matrix -- - The default method is 'euclidean' with support for "*euclidean*", "*maximum*", "*manhattan*", "*canberra*", "*binary*" or "*minkowski*" calculations -- - Within R it is simple to compute and visualize the distance matrix using the functions `get_dist` and `fviz_dist` from the `factoextra` package. ```r dist <- dist(data) length(dist) ``` ``` [1] 15753 ``` --- - Visualization tools, like those found in `factoextra` can help visualize a distance matrix -- ```r library(factoextra) fviz_dist(dist) ``` <img src="lecture-20_files/figure-html/unnamed-chunk-21-1.png" width="432" style="display: block; margin: auto;" /> -- - This visualization begins to illustrate which wines have large dissimilarities (blue) and those that are fairly similar (red). - The red diagonal across the middle is the comparison of each wine to itself, thus the perfect similarity! --- # Categorizing your distance matrix - K-means clustering is the most common unsupervised machine learning algorithm for partitioning a given data set into a set of k groups (i.e. k clusters) -- - k represents the number of groups pre-specified by the analyst. -- - It classifies objects in multiple groups (i.e., clusters), such that objects within the same cluster are as similar as possible (i.e., high intra-class similarity), and objects from different clusters are as dissimilar as possible (i.e., low inter-class similarity). -- - In k-means clustering, each cluster is represented by its `center` (i.e, centroid) which corresponds to the `mean` of values assigned to the cluster. --- <img src="lec-img/20-intra-inter.jpg" width="711" style="display: block; margin: auto;" /> --- # Determining Cluster Variation - k-means clustering defines clusters so that the total `intra-cluster` variation (total within-cluster variation) is `minimized.` -- - There are several k-means algorithms available. -- - The standard algorithm is the Hartigan-Wong algorithm (1979), which defines the total within-cluster variation as the sum of squared Euclidean distances between items and the corresponding centroid: $$ W(C_k) = \sum(x_i - \mu_k)^2$$ where: `\(x_i\)` is a data point belonging to the cluster `\(C_k\)` `\(\mu_k\)` is the mean value of the points assigned to the cluster `\(C_k\)` --- Each observation ( `\(x_i\)` ) is assigned to a given cluster ( `\(k\)` ) such that the sum of squares (SS) distance of the observation to their assigned cluster centers ( `\(\mu_k\)` ) is minimized. -- The **total** within-cluster variation of the clustering is defined as: $$ total.withiness = \sum_{k=1}^k W(C_k)$$ -- - The _total_, -- _within-cluster_, -- _sum of square_, measures the compactness of the clustering -- - We want it to be as **small** as possible. --- # Implementing the Algorithm K-means algorithm can be summarized as follows: - Specify the number of clusters (k) to be created - User choice .. say we pick 5 <img src="lec-img/20-kmeans-1.png" width="311" style="display: block; margin: auto;" /> --- - Select randomly k objects from the data set as the initial cluster `centers` or `means` <img src="lec-img/20-kmeans-2.png" width="311" style="display: block; margin: auto;" /> --- - Assigns each observation to their closest centroid, based on the Euclidean distance between the object and the centroid <img src="lec-img/20-kmeans-3.png" width="311" style="display: block; margin: auto;" /> --- - For each of the `k` clusters, the cluster centroid is updated by calculating the new mean of all the data points assigned to that cluster. <img src="lec-img/20-kmeans-4.png" width="311" style="display: block; margin: auto;" /> --- - Iteratively _minimize_ the total within sum of square. That is, iterate the assignment and centroid updating until the cluster assignments stop changing - or - the maximum number of iterations is reached. - Repeat until no observations change cluster <img src="lec-img/20-kmeans-5.png" width="311" style="display: block; margin: auto;" /> --- # Making 2 clusters! ```r set.seed(09032020) # random process becomes repeatable k2 <- kmeans(data, centers = 2) str(k2) ``` ``` List of 9 $ cluster : Named int [1:178] 1 2 1 1 1 1 1 1 1 1 ... ..- attr(*, "names")= chr [1:178] "1" "2" "3" "4" ... $ centers : num [1:2, 1:3] 0.576 -0.849 0.363 -0.535 0.441 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:2] "1" "2" .. ..$ : chr [1:3] "Alcohol" "Malic" "Ash" $ totss : num 531 $ withinss : num [1:2] 238 120 $ tot.withinss: num 358 $ betweenss : num 173 $ size : int [1:2] 106 72 $ iter : int 1 $ ifault : int 0 - attr(*, "class")= chr "kmeans" ``` --- ```r names(k2) ``` ``` [1] "cluster" "centers" "totss" [4] "withinss" "tot.withinss" "betweenss" [7] "size" "iter" "ifault" ``` -- The output of kmeans is a _list_ with several elements. The most important being: -- **cluster**: A vector of integers (from 1:k) indicating the cluster to which each point is allocated. -- **centers**: A matrix of cluster centers. -- **totss**: The total sum of squares. -- **withinss**: Vector of within-cluster sum of squares, one component per cluster. -- **tot.withinss**: Total within-cluster sum of squares, i.e. sum(withinss). -- **betweenss**: The between-cluster sum of squares, i.e. `\(totss-tot.withinss\)`. -- **size**: The number of points in each cluster. --- # Visualize `factoextra` provide a nice wrapper from visualizing clustered data: If there are more than two dimensions (variables), `fviz_cluster` performs a principal component analysis (PCA) and plots the data points according to the two principal components that explain the majority of the variance. ```r fviz_cluster(k2, data = data) ``` <img src="lecture-20_files/figure-html/unnamed-chunk-30-1.png" width="432" style="display: block; margin: auto;" /> --- # Testing other K values ```r k3 <- kmeans(data, centers = 3, nstart = 25) k4 <- kmeans(data, centers = 4, nstart = 25) k5 <- kmeans(data, centers = 5, nstart = 25) # plots to compare p1 <- fviz_cluster(k2, geom = "point", data = data) + ggtitle("k = 2") p2 <- fviz_cluster(k3, geom = "point", data = data) + ggtitle("k = 3") p3 <- fviz_cluster(k4, geom = "point", data = data) + ggtitle("k = 4") p4 <- fviz_cluster(k5, geom = "point", data = data) + ggtitle("k = 5") ``` --- ```r library(patchwork) p1+p2+p3+p4 ``` <img src="lecture-20_files/figure-html/unnamed-chunk-32-1.png" width="432" style="display: block; margin: auto;" /> --- # Application to Raster Data! - each layer of a RasterStack is a layer - each layer is a vector of values ```r library(climateR) params = c("prcp", "tmax", "tmin", "srad", "q") AOI = USAboundaries::us_states() %>% filter(name == "Colorado") co = climateR::getTerraClim(AOI, params, startDate = "2018-10-01") %>% stack() %>% setNames(params) ``` ``` Spherical geometry (s2) switched off Spherical geometry (s2) switched on ``` --- # Colorado October 2018 climate ```r plot(co) ``` <img src="lecture-20_files/figure-html/unnamed-chunk-34-1.png" width="432" style="display: block; margin: auto;" /> --- # Raster layers are vectors! ```r values = values(co) head(values) ``` ``` prcp tmax tmin srad q [1,] 3.8 0.2 220.9327 22.8 3.6 [2,] 4.1 0.2 219.4443 21.5 3.2 [3,] 4.0 0.2 219.6443 22.0 3.3 [4,] 3.7 0.2 220.3325 23.3 3.7 [5,] 3.7 0.2 220.8286 23.5 3.8 [6,] 3.5 0.2 221.2287 24.2 4.1 ``` --- # Data Prep - Identify NA indices for latter reference - Remove NA values - Scale ```r idx = which(!is.na(values)) v = na.omit(values) vs = scale(v) ``` --- ```r E = kmeans(vs, 5, iter.max = 100) E ``` ``` K-means clustering with 5 clusters of sizes 4078, 6252, 1043, 3439, 1750 Cluster means: prcp tmax tmin srad q 1 -0.4551978 -0.4524243 -0.25038134 -1.2604347 -1.2391167 2 -0.7072649 -0.7076268 0.61797764 0.5073093 0.2689811 3 2.3223271 2.3210312 -0.01228113 -1.3270937 -1.1748720 4 0.7730360 0.7680146 -1.26771914 0.7374313 1.1140070 5 0.6842624 0.6897322 0.87425957 0.4665642 0.4375826 Clustering vector: [1] 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [28] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 [55] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [82] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 [109] 2 2 2 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 [136] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 [163] 4 4 4 4 4 4 4 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [190] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 [217] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [244] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 [271] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 [298] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 [325] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2 2 2 2 1 2 2 2 2 2 2 2 2 [352] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [379] 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [406] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 [433] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [460] 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 [487] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2 2 2 2 2 2 [514] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [541] 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [568] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [595] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [622] 2 2 2 2 2 2 2 2 2 2 4 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 [649] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 [676] 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [703] 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [730] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [757] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [784] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 [811] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 [838] 4 4 4 4 4 4 4 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [865] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 [892] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [919] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 [946] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [973] 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 [1000] 4 [ reached getOption("max.print") -- omitted 15562 entries ] Within cluster sum of squares by cluster: [1] 4323.862 8554.143 2794.234 5181.992 2593.711 (between_SS / total_SS = 71.7 %) Available components: [1] "cluster" "centers" "totss" [4] "withinss" "tot.withinss" "betweenss" [7] "size" "iter" "ifault" ``` --- ### Copying a raster structure ```r clus_raster = co$tmax values(clus_raster) = NA ``` -- ### Assign values ```r clus_raster[idx] <- E$cluster plot(clus_raster, col = RColorBrewer::brewer.pal(5, "Spectral")) ``` <img src="lecture-20_files/figure-html/unnamed-chunk-39-1.png" width="432" style="display: block; margin: auto;" /> --- # Merging across data sources ```r # Get elevations data elev = elevatr::get_elev_raster(AOI, z = 5) %>% crop(AOI) # Align Raster extents and resolutions elev = projectRaster(elev, co$prcp) # Extract Values values = stack(co$prcp, elev) %>% values() # Prep data idx = which(!is.na(values)) v = na.omit(values) vs = scale(v) # Cluster E = kmeans(vs, 5, iter.max = 100) clus_raster = elev[[1]] values(clus_raster) = NA clus_raster[idx] <- E$cluster ``` --- .pull-left[ ```r par(mfrow = c(2,1)) plot(elev) plot(co$prcp) ``` <img src="lecture-20_files/figure-html/unnamed-chunk-41-1.png" width="432" style="display: block; margin: auto;" /> ] .pull-right[ ```r plot(clus_raster, col = RColorBrewer::brewer.pal(5, "Spectral")) ``` <img src="lecture-20_files/figure-html/unnamed-chunk-42-1.png" width="432" style="display: block; margin: auto;" /> ] --- # Clustering at larger scales ```r counties = filter(USAboundaries::us_counties(), !state_abbr %in% c("AK", "HI", "PR")) params = c("tmax", "tmin", "prcp", "srad") s = getTerraClim(counties, params, startDate = "2018-06-01") %>% stack() %>% setNames(params) ``` <img src="lec-img/20-raster-save.png" width="342" style="display: block; margin: auto;" /> --- ```r dat = exactextractr::exact_extract(s, counties, "mean", progress = FALSE) ``` ```r dat = scale(dat) counties$clust8 = kmeans(dat, 8)$cluster plot(st_transform(counties["clust8"], 5070), border = NA) ``` <img src="lecture-20_files/figure-html/unnamed-chunk-47-1.png" width="432" style="display: block; margin: auto;" /> --- # Daily Assignment In a new Rmd... - Install `climateR` (remotes::install_github("mikejohnson51/climateR")) - Pick 2-4 climate variables for a state of your choice for one month of your choice (winter,summer, spring, etc) - Pick your climate variables based on the state and month you choose - You can find available climate variables using (`param_meta$terraclim`) (use the `call` column) ```r param_meta$terraclim ``` ``` common.name call 1 aet aet 2 water_deficit def 3 palmer PDSI 4 pet pet 5 prcp ppt 6 q q 7 soilm soil 8 srad srad 9 swe swe 10 tmax tmax 11 tmin tmin 12 vp vap 13 vpd vpd 14 wind ws description units 1 Actual Evapotranspiration mm 2 Climatic Water Deficit mm 3 Palmer Drought Severity Index 4 Reference Evapotranspiration mm 5 Accumulated Precipitation mm 6 Runoff mm 7 Soil Moisture at End of Month mm 8 Downward Shortwave Radiation Flux at the Surface W/m2 9 Snow Water Equivalent at End of Month mm 10 Maximum 2-m Temperature C 11 Minimum 2-m Temperature C 12 2-m Vapor Pressure kPa 13 Mean Vapor Pressure Deficit kPa 14 Wind Speed at 10-m m/s ``` --- - Implement a k-means classification for your state and your variables. - Plot your final image using a nice color palette --- # Submission Submit your knit document to Gauchospace