Geography 13

# Geography 13
## Lecture 07: Relations & Format
### Mike Johnson

---

# Picking back up!

We’ve covered many topics on how to manipulate and reshape a single data.frame:

**Last Week**: Data type and data structures

**Tuesday**: `data.frame` manipulation

**Wednesday**: `data.frame` visualization

**Today**: When one table is not enough (or when its not right)

- _Joins_: When 2 is better then 1

- _Pivots_: When the format is not ideal for the task

---

# Yesterdays Assignment

1. Make a _faceted_ line plot (*geom_line*) of the **6** states with **most** cases. Your X axis should be the _date_ and the y axis _cases_.
2. Make a column plot (`geom_col`) of daily total cases in the **USA**. Your X axis should be the _date_ and the y axis _cases_.

```r
library(tidyverse)
url = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv'
covid = read_csv(url)
head(covid)
```

```
# A tibble: 6 x 6
  date       county    state      fips  cases deaths
  <date>     <chr>     <chr>      <chr> <dbl>  <dbl>
1 2020-01-21 Snohomish Washington 53061     1      0
2 2020-01-22 Snohomish Washington 53061     1      0
3 2020-01-23 Snohomish Washington 53061     1      0
4 2020-01-24 Cook      Illinois   17031     1      0
5 2020-01-24 Snohomish Washington 53061     1      0
6 2020-01-25 Orange    California 06059     1      0
```

---

```r
*covid
```
]
 
.panel2-q11-auto[

```
# A tibble: 1,472,337 x 6
   date       county      state      fips  cases deaths
   <date>     <chr>       <chr>      <chr> <dbl>  <dbl>
 1 2020-01-21 Snohomish   Washington 53061     1      0
 2 2020-01-22 Snohomish   Washington 53061     1      0
 3 2020-01-23 Snohomish   Washington 53061     1      0
 4 2020-01-24 Cook        Illinois   17031     1      0
 5 2020-01-24 Snohomish   Washington 53061     1      0
 6 2020-01-25 Orange      California 06059     1      0
 7 2020-01-25 Cook        Illinois   17031     1      0
 8 2020-01-25 Snohomish   Washington 53061     1      0
 9 2020-01-26 Maricopa    Arizona    04013     1      0
10 2020-01-26 Los Angeles California 06037     1      0
# … with 1,472,327 more rows
```
]

---
count: false
 
#Question 1: Close...
.panel1-q11-auto[

```r
covid %>%
* filter(date == max(date))
```
]
 
.panel2-q11-auto[

```
# A tibble: 3,246 x 6
   date       county   state   fips  cases deaths
   <date>     <chr>    <chr>   <chr> <dbl>  <dbl>
 1 2021-06-30 Autauga  Alabama 01001  7257    113
 2 2021-06-30 Baldwin  Alabama 01003 22027    315
 3 2021-06-30 Barbour  Alabama 01005  2346     60
 4 2021-06-30 Bibb     Alabama 01007  2693     64
 5 2021-06-30 Blount   Alabama 01009  6987    139
 6 2021-06-30 Bullock  Alabama 01011  1249     42
 7 2021-06-30 Butler   Alabama 01013  2262     71
 8 2021-06-30 Calhoun  Alabama 01015 14776    330
 9 2021-06-30 Chambers Alabama 01017  3736    123
10 2021-06-30 Cherokee Alabama 01019  1874     45
# … with 3,236 more rows
```
]

---
count: false
 
#Question 1: Close...
.panel1-q11-auto[

```r
covid %>%
  filter(date == max(date)) %>%
* group_by(state)
```
]
 
.panel2-q11-auto[

```
# A tibble: 3,246 x 6
# Groups:   state [55]
   date       county   state   fips  cases deaths
   <date>     <chr>    <chr>   <chr> <dbl>  <dbl>
 1 2021-06-30 Autauga  Alabama 01001  7257    113
 2 2021-06-30 Baldwin  Alabama 01003 22027    315
 3 2021-06-30 Barbour  Alabama 01005  2346     60
 4 2021-06-30 Bibb     Alabama 01007  2693     64
 5 2021-06-30 Blount   Alabama 01009  6987    139
 6 2021-06-30 Bullock  Alabama 01011  1249     42
 7 2021-06-30 Butler   Alabama 01013  2262     71
 8 2021-06-30 Calhoun  Alabama 01015 14776    330
 9 2021-06-30 Chambers Alabama 01017  3736    123
10 2021-06-30 Cherokee Alabama 01019  1874     45
# … with 3,236 more rows
```
]

---
count: false
 
#Question 1: Close...
.panel1-q11-auto[

```r
covid %>%
  filter(date == max(date)) %>%
  group_by(state) %>%
* summarize(cases = sum(cases, na.rm = TRUE))
```
]
 
.panel2-q11-auto[

```
# A tibble: 55 x 2
   state                  cases
   <chr>                  <dbl>
 1 Alabama               550983
 2 Alaska                 70669
 3 Arizona               894875
 4 Arkansas              349385
 5 California           3816704
 6 Colorado              560927
 7 Connecticut           349352
 8 Delaware              109744
 9 District of Columbia   49335
10 Florida              2321929
# … with 45 more rows
```
]

---
count: false
 
#Question 1: Close...
.panel1-q11-auto[

```r
covid %>%
  filter(date == max(date)) %>%
  group_by(state) %>%
  summarize(cases = sum(cases, na.rm = TRUE)) %>%
* ungroup()
```
]
 
.panel2-q11-auto[

---
count: false
 
#Question 1: Close...
.panel1-q11-auto[

```r
covid %>%
  filter(date == max(date)) %>%
  group_by(state) %>%
  summarize(cases = sum(cases, na.rm = TRUE)) %>%
  ungroup() %>%
* slice_max(cases, n = 6)
```
]
 
.panel2-q11-auto[

```
# A tibble: 6 x 2
  state          cases
  <chr>          <dbl>
1 California   3816704
2 Texas        2993964
3 Florida      2321929
4 New York     2113147
5 Illinois     1395863
6 Pennsylvania 1216579
```
]

---
count: false
 
#Question 1: Close...
.panel1-q11-auto[

```r
covid %>%
  filter(date == max(date)) %>%
  group_by(state) %>%
  summarize(cases = sum(cases, na.rm = TRUE)) %>%
  ungroup() %>%
  slice_max(cases, n = 6) %>%
* pull(state)
```
]
 
.panel2-q11-auto[

```
[1] "California"   "Texas"        "Florida"      "New York"     "Illinois"    
[6] "Pennsylvania"
```
]

---
count: false
 
#Question 1: Close...
.panel1-q11-auto[

```r
covid %>%
  filter(date == max(date)) %>%
  group_by(state) %>%
  summarize(cases = sum(cases, na.rm = TRUE)) %>%
  ungroup() %>%
  slice_max(cases, n = 6) %>%
  pull(state)

*covid
```
]
 
.panel2-q11-auto[

```
[1] "California"   "Texas"        "Florida"      "New York"     "Illinois"    
[6] "Pennsylvania"
```

---
count: false
 
#Question 1: Close...
.panel1-q11-auto[

```r
covid %>%
  filter(date == max(date)) %>%
  group_by(state) %>%
  summarize(cases = sum(cases, na.rm = TRUE)) %>%
  ungroup() %>%
  slice_max(cases, n = 6) %>%
  pull(state)

covid %>%
* filter(state %in% c("California", "Florida", "Texas", "New York", "Georgia", "Illinois"))
```
]
 
.panel2-q11-auto[

```
[1] "California"   "Texas"        "Florida"      "New York"     "Illinois"    
[6] "Pennsylvania"
```

```
# A tibble: 320,407 x 6
   date       county      state      fips  cases deaths
   <date>     <chr>       <chr>      <chr> <dbl>  <dbl>
 1 2020-01-24 Cook        Illinois   17031     1      0
 2 2020-01-25 Orange      California 06059     1      0
 3 2020-01-25 Cook        Illinois   17031     1      0
 4 2020-01-26 Los Angeles California 06037     1      0
 5 2020-01-26 Orange      California 06059     1      0
 6 2020-01-26 Cook        Illinois   17031     1      0
 7 2020-01-27 Los Angeles California 06037     1      0
 8 2020-01-27 Orange      California 06059     1      0
 9 2020-01-27 Cook        Illinois   17031     1      0
10 2020-01-28 Los Angeles California 06037     1      0
# … with 320,397 more rows
```
]

---
count: false
 
#Question 1: Close...
.panel1-q11-auto[

```r
covid %>%
  filter(date == max(date)) %>%
  group_by(state) %>%
  summarize(cases = sum(cases, na.rm = TRUE)) %>%
  ungroup() %>%
  slice_max(cases, n = 6) %>%
  pull(state)

covid %>%
  filter(state %in% c("California", "Florida", "Texas", "New York", "Georgia", "Illinois")) %>%
* ggplot(aes(x = date, y = cases))
```
]
 
.panel2-q11-auto[

```
[1] "California"   "Texas"        "Florida"      "New York"     "Illinois"    
[6] "Pennsylvania"
```

<img src="lecture-07_files/figure-html/q11_auto_10_output-1.png" width="432" />
]

---
count: false
 
#Question 1: Close...
.panel1-q11-auto[

```r
covid %>%
  filter(date == max(date)) %>%
  group_by(state) %>%
  summarize(cases = sum(cases, na.rm = TRUE)) %>%
  ungroup() %>%
  slice_max(cases, n = 6) %>%
  pull(state)

covid %>%
  filter(state %in% c("California", "Florida", "Texas", "New York", "Georgia", "Illinois")) %>%
  ggplot(aes(x = date, y = cases)) +
* geom_line(aes(color = state))
```
]
 
.panel2-q11-auto[

```
[1] "California"   "Texas"        "Florida"      "New York"     "Illinois"    
[6] "Pennsylvania"
```

<img src="lecture-07_files/figure-html/q11_auto_11_output-1.png" width="432" />
]

---
count: false
 
#Question 1: Close...
.panel1-q11-auto[

```r
covid %>%
  filter(date == max(date)) %>%
  group_by(state) %>%
  summarize(cases = sum(cases, na.rm = TRUE)) %>%
  ungroup() %>%
  slice_max(cases, n = 6) %>%
  pull(state)

covid %>%
  filter(state %in% c("California", "Florida", "Texas", "New York", "Georgia", "Illinois")) %>%
  ggplot(aes(x = date, y = cases)) +
  geom_line(aes(color = state)) +
* facet_wrap(~state)
```
]
 
.panel2-q11-auto[

```
[1] "California"   "Texas"        "Florida"      "New York"     "Illinois"    
[6] "Pennsylvania"
```

<img src="lecture-07_files/figure-html/q11_auto_12_output-1.png" width="432" />
]

---
count: false
 
#Question 1: Close...
.panel1-q11-auto[

```r
covid %>%
  filter(date == max(date)) %>%
  group_by(state) %>%
  summarize(cases = sum(cases, na.rm = TRUE)) %>%
  ungroup() %>%
  slice_max(cases, n = 6) %>%
  pull(state)

covid %>%
  filter(state %in% c("California", "Florida", "Texas", "New York", "Georgia", "Illinois")) %>%
  ggplot(aes(x = date, y = cases)) +
  geom_line(aes(color = state)) +
  facet_wrap(~state) +
* theme_gray()
```
]
 
.panel2-q11-auto[

```
[1] "California"   "Texas"        "Florida"      "New York"     "Illinois"    
[6] "Pennsylvania"
```

<img src="lecture-07_files/figure-html/q11_auto_13_output-1.png" width="432" />
]