Lecture 05

Data Structures

Author

Mike Johnson

Published

February 22, 2025

Grading Checking

  • So far we have been lenient with Daily Assignments. This was to give you all a chance to overcome any personal/compute challenges.

  • Daily Assignments not turned in by EOD today will not be counted

  • Remember, Daily Assignments are pass/fail based on effort. Labs are graded by quality of product

  • With respect to your Lab 1 websites, they should resemble something you would want a possible employer to see.

R Packages

  • In R, the fundamental unit of shareable code is the package.

  • Bundles together code, data, documentation, and tests, in a way that is easy to share.

CRAN

  • The “Comprehensive R Archive Network” (CRAN) is a collection of sites which carry identical material, consisting of the R distribution(s) and contributed packages

. . .

  • CRAN enforces a Repository Policy that ensures contributed code is safe and works (meaning it works not necessarily that its good :))

  • This huge variety of packages is one of the reasons that R is so successful: the chances are that someone has already solved a problem that you’re working on, and you can benefit from their work by downloading their package.

You already know how to use packages:

  • You install them from CRAN with install.packages("XXX").

. . .

  • You install them from Github with remotes::install_github("USERNAME/REPO").

. . .

  • You use them in R with library("XXX")

. . .

  • You get help on them with package ?XXX

Install vs. Attach

What is a function:

  • A function is a set of statements (directions) organized together to perform a specific task. R has a large number of in-built functions and the user can create their own functions.

. . .

library(tidyverse)
lsf.str("package:dplyr")
#> %>% : function (lhs, rhs)  
#> across : function (.cols, .fns, ..., .names = NULL, .unpack = FALSE)  
#> add_count : function (x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = deprecated())  
#> add_count_ : function (x, vars, wt = NULL, sort = FALSE)  
#> add_row : function (.data, ..., .before = NULL, .after = NULL)  
#> add_rownames : function (df, var = "rowname")  
#> add_tally : function (x, wt = NULL, sort = FALSE, name = NULL)  
#> add_tally_ : function (x, wt, sort = FALSE)  
#> all_equal : function (target, current, ignore_col_order = TRUE, ignore_row_order = TRUE, 
#>     convert = FALSE, ...)  
#> all_of : function (x)  
#> all_vars : function (expr)  
#> anti_join : function (x, y, by = NULL, copy = FALSE, ...)  
#> any_of : function (x, ..., vars = NULL)  
#> any_vars : function (expr)  
#> arrange : function (.data, ..., .by_group = FALSE)  
#> arrange_ : function (.data, ..., .dots = list())  
#> arrange_all : function (.tbl, .funs = list(), ..., .by_group = FALSE, .locale = NULL)  
#> arrange_at : function (.tbl, .vars, .funs = list(), ..., .by_group = FALSE, .locale = NULL)  
#> arrange_if : function (.tbl, .predicate, .funs = list(), ..., .by_group = FALSE, .locale = NULL)  
#> as_data_frame : function (x, ...)  
#> as_label : function (x)  
#> as_tibble : function (x, ..., .rows = NULL, .name_repair = c("check_unique", "unique", 
#>     "universal", "minimal"), rownames = pkgconfig::get_config("tibble::rownames", 
#>     NULL))  
#> as.tbl : function (x, ...)  
#> auto_copy : function (x, y, copy = FALSE, ...)  
#> bench_tbls : function (tbls, op, ..., times = 10)  
#> between : function (x, left, right)  
#> bind_cols : function (..., .name_repair = c("unique", "universal", "check_unique", 
#>     "minimal"))  
#> bind_rows : function (..., .id = NULL)  
#> c_across : function (cols)  
#> case_match : function (.x, ..., .default = NULL, .ptype = NULL)  
#> case_when : function (..., .default = NULL, .ptype = NULL, .size = NULL)  
#> changes : function (x, y)  
#> check_dbplyr : function ()  
#> coalesce : function (..., .ptype = NULL, .size = NULL)  
#> collapse : function (x, ...)  
#> collect : function (x, ...)  
#> combine : function (...)  
#> common_by : function (by = NULL, x, y)  
#> compare_tbls : function (tbls, op, ref = NULL, compare = equal_data_frame, ...)  
#> compare_tbls2 : function (tbls_x, tbls_y, op, ref = NULL, compare = equal_data_frame, ...)  
#> compute : function (x, ...)  
#> consecutive_id : function (...)  
#> contains : function (match, ignore.case = TRUE, vars = NULL)  
#> copy_to : function (dest, df, name = deparse(substitute(df)), overwrite = FALSE, 
#>     ...)  
#> count : function (x, ..., wt = NULL, sort = FALSE, name = NULL)  
#> count_ : function (x, vars, wt = NULL, sort = FALSE, .drop = group_by_drop_default(x))  
#> cross_join : function (x, y, ..., copy = FALSE, suffix = c(".x", ".y"))  
#> cumall : function (x)  
#> cumany : function (x)  
#> cume_dist : function (x)  
#> cummean : function (x)  
#> cur_column : function ()  
#> cur_data : function ()  
#> cur_data_all : function ()  
#> cur_group : function ()  
#> cur_group_id : function ()  
#> cur_group_rows : function ()  
#> current_vars : function (...)  
#> data_frame : function (...)  
#> db_analyze : function (con, table, ...)  
#> db_begin : function (con, ...)  
#> db_commit : function (con, ...)  
#> db_create_index : function (con, table, columns, name = NULL, unique = FALSE, ...)  
#> db_create_indexes : function (con, table, indexes = NULL, unique = FALSE, ...)  
#> db_create_table : function (con, table, types, temporary = FALSE, ...)  
#> db_data_type : function (con, fields)  
#> db_desc : function (x)  
#> db_drop_table : function (con, table, force = FALSE, ...)  
#> db_explain : function (con, sql, ...)  
#> db_has_table : function (con, table)  
#> db_insert_into : function (con, table, values, ...)  
#> db_list_tables : function (con)  
#> db_query_fields : function (con, sql, ...)  
#> db_query_rows : function (con, sql, ...)  
#> db_rollback : function (con, ...)  
#> db_save_query : function (con, sql, name, temporary = TRUE, ...)  
#> db_write_table : function (con, table, types, values, temporary = FALSE, ...)  
#> dense_rank : function (x)  
#> desc : function (x)  
#> dim_desc : function (x)  
#> distinct : function (.data, ..., .keep_all = FALSE)  
#> distinct_ : function (.data, ..., .dots, .keep_all = FALSE)  
#> distinct_all : function (.tbl, .funs = list(), ..., .keep_all = FALSE)  
#> distinct_at : function (.tbl, .vars, .funs = list(), ..., .keep_all = FALSE)  
#> distinct_if : function (.tbl, .predicate, .funs = list(), ..., .keep_all = FALSE)  
#> distinct_prepare : function (.data, vars, group_vars = character(), .keep_all = FALSE, caller_env = caller_env(2), 
#>     error_call = caller_env())  
#> do : function (.data, ...)  
#> do_ : function (.data, ..., .dots = list())  
#> dplyr_col_modify : function (data, cols)  
#> dplyr_reconstruct : function (data, template)  
#> dplyr_row_slice : function (data, i, ...)  
#> ends_with : function (match, ignore.case = TRUE, vars = NULL)  
#> enexpr : function (arg)  
#> enexprs : function (..., .named = FALSE, .ignore_empty = c("trailing", "none", "all"), 
#>     .ignore_null = c("none", "all"), .unquote_names = TRUE, .homonyms = c("keep", 
#>         "first", "last", "error"), .check_assign = FALSE)  
#> enquo : function (arg)  
#> enquos : function (..., .named = FALSE, .ignore_empty = c("trailing", "none", "all"), 
#>     .ignore_null = c("none", "all"), .unquote_names = TRUE, .homonyms = c("keep", 
#>         "first", "last", "error"), .check_assign = FALSE)  
#> ensym : function (arg)  
#> ensyms : function (..., .named = FALSE, .ignore_empty = c("trailing", "none", "all"), 
#>     .ignore_null = c("none", "all"), .unquote_names = TRUE, .homonyms = c("keep", 
#>         "first", "last", "error"), .check_assign = FALSE)  
#> eval_tbls : function (tbls, op)  
#> eval_tbls2 : function (tbls_x, tbls_y, op)  
#> everything : function (vars = NULL)  
#> explain : function (x, ...)  
#> expr : function (expr)  
#> failwith : function (default = NULL, f, quiet = FALSE)  
#> filter : function (.data, ..., .by = NULL, .preserve = FALSE)  
#> filter_ : function (.data, ..., .dots = list())  
#> filter_all : function (.tbl, .vars_predicate, .preserve = FALSE)  
#> filter_at : function (.tbl, .vars, .vars_predicate, .preserve = FALSE)  
#> filter_if : function (.tbl, .predicate, .vars_predicate, .preserve = FALSE)  
#> first : function (x, order_by = NULL, default = NULL, na_rm = FALSE)  
#> full_join : function (x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = NULL)  
#> funs : function (..., .args = list())  
#> funs_ : function (dots, args = list(), env = base_env())  
#> glimpse : function (x, width = NULL, ...)  
#> group_by : function (.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))  
#> group_by_ : function (.data, ..., .dots = list(), add = FALSE)  
#> group_by_all : function (.tbl, .funs = list(), ..., .add = FALSE, .drop = group_by_drop_default(.tbl))  
#> group_by_at : function (.tbl, .vars, .funs = list(), ..., .add = FALSE, .drop = group_by_drop_default(.tbl))  
#> group_by_drop_default : function (.tbl)  
#> group_by_if : function (.tbl, .predicate, .funs = list(), ..., .add = FALSE, .drop = group_by_drop_default(.tbl))  
#> group_by_prepare : function (.data, ..., .add = FALSE, .dots = deprecated(), add = deprecated(), 
#>     error_call = caller_env())  
#> group_cols : function (vars = NULL, data = NULL)  
#> group_data : function (.data)  
#> group_indices : function (.data, ...)  
#> group_indices_ : function (.data, ..., .dots = list())  
#> group_keys : function (.tbl, ...)  
#> group_map : function (.data, .f, ..., .keep = FALSE)  
#> group_modify : function (.data, .f, ..., .keep = FALSE)  
#> group_nest : function (.tbl, ..., .key = "data", keep = FALSE)  
#> group_rows : function (.data)  
#> group_size : function (x)  
#> group_split : function (.tbl, ..., .keep = TRUE)  
#> group_trim : function (.tbl, .drop = group_by_drop_default(.tbl))  
#> group_vars : function (x)  
#> group_walk : function (.data, .f, ..., .keep = FALSE)  
#> grouped_df : function (data, vars, drop = group_by_drop_default(data))  
#> groups : function (x)  
#> id : function (.variables, drop = FALSE)  
#> ident : function (...)  
#> if_all : function (.cols, .fns, ..., .names = NULL)  
#> if_any : function (.cols, .fns, ..., .names = NULL)  
#> if_else : function (condition, true, false, missing = NULL, ..., ptype = NULL, size = NULL)  
#> inner_join : function (x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = NULL)  
#> intersect : function (x, y, ...)  
#> is_grouped_df : function (x)  
#> is.grouped_df : function (x)  
#> is.src : function (x)  
#> is.tbl : function (x)  
#> join_by : function (...)  
#> lag : function (x, n = 1L, default = NULL, order_by = NULL, ...)  
#> last : function (x, order_by = NULL, default = NULL, na_rm = FALSE)  
#> last_col : function (offset = 0L, vars = NULL)  
#> last_dplyr_warnings : function (n = 5)  
#> lead : function (x, n = 1L, default = NULL, order_by = NULL, ...)  
#> left_join : function (x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = NULL)  
#> location : function (df)  
#> lst : function (...)  
#> make_tbl : function (subclass, ...)  
#> matches : function (match, ignore.case = TRUE, perl = FALSE, vars = NULL)  
#> min_rank : function (x)  
#> mutate : function (.data, ...)  
#> mutate_ : function (.data, ..., .dots = list())  
#> mutate_all : function (.tbl, .funs, ...)  
#> mutate_at : function (.tbl, .vars, .funs, ..., .cols = NULL)  
#> mutate_each : function (tbl, funs, ...)  
#> mutate_each_ : function (tbl, funs, vars)  
#> mutate_if : function (.tbl, .predicate, .funs, ...)  
#> n : function ()  
#> n_distinct : function (..., na.rm = FALSE)  
#> n_groups : function (x)  
#> na_if : function (x, y)  
#> near : function (x, y, tol = .Machine$double.eps^0.5)  
#> nest_by : function (.data, ..., .key = "data", .keep = FALSE)  
#> nest_join : function (x, y, by = NULL, copy = FALSE, keep = NULL, name = NULL, ...)  
#> new_grouped_df : function (x, groups, ..., class = character())  
#> new_rowwise_df : function (data, group_data = NULL, ..., class = character())  
#> nth : function (x, n, order_by = NULL, default = NULL, na_rm = FALSE)  
#> ntile : function (x = row_number(), n)  
#> num_range : function (prefix, range, suffix = "", width = NULL, vars = NULL)  
#> one_of : function (..., .vars = NULL)  
#> order_by : function (order_by, call)  
#> percent_rank : function (x)  
#> pick : function (...)  
#> progress_estimated : function (n, min_time = 0)  
#> pull : function (.data, var = -1, name = NULL, ...)  
#> quo : function (expr)  
#> quo_name : function (quo)  
#> quos : function (..., .named = FALSE, .ignore_empty = c("trailing", "none", "all"), 
#>     .unquote_names = TRUE)  
#> recode : function (.x, ..., .default = NULL, .missing = NULL)  
#> recode_factor : function (.x, ..., .default = NULL, .missing = NULL, .ordered = FALSE)  
#> reframe : function (.data, ..., .by = NULL)  
#> relocate : function (.data, ..., .before = NULL, .after = NULL)  
#> rename : function (.data, ...)  
#> rename_ : function (.data, ..., .dots = list())  
#> rename_all : function (.tbl, .funs = list(), ...)  
#> rename_at : function (.tbl, .vars, .funs = list(), ...)  
#> rename_if : function (.tbl, .predicate, .funs = list(), ...)  
#> rename_vars : function (vars = chr(), ..., strict = TRUE)  
#> rename_vars_ : function (vars, args)  
#> rename_with : function (.data, .fn, .cols = everything(), ...)  
#> right_join : function (x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = NULL)  
#> row_number : function (x)  
#> rows_append : function (x, y, ..., copy = FALSE, in_place = FALSE)  
#> rows_delete : function (x, y, by = NULL, ..., unmatched = c("error", "ignore"), copy = FALSE, 
#>     in_place = FALSE)  
#> rows_insert : function (x, y, by = NULL, ..., conflict = c("error", "ignore"), copy = FALSE, 
#>     in_place = FALSE)  
#> rows_patch : function (x, y, by = NULL, ..., unmatched = c("error", "ignore"), copy = FALSE, 
#>     in_place = FALSE)  
#> rows_update : function (x, y, by = NULL, ..., unmatched = c("error", "ignore"), copy = FALSE, 
#>     in_place = FALSE)  
#> rows_upsert : function (x, y, by = NULL, ..., copy = FALSE, in_place = FALSE)  
#> rowwise : function (data, ...)  
#> same_src : function (x, y)  
#> sample_frac : function (tbl, size = 1, replace = FALSE, weight = NULL, .env = NULL, ...)  
#> sample_n : function (tbl, size, replace = FALSE, weight = NULL, .env = NULL, ...)  
#> select : function (.data, ...)  
#> select_ : function (.data, ..., .dots = list())  
#> select_all : function (.tbl, .funs = list(), ...)  
#> select_at : function (.tbl, .vars, .funs = list(), ...)  
#> select_if : function (.tbl, .predicate, .funs = list(), ...)  
#> select_var : function (vars, var = -1)  
#> select_vars : function (vars = chr(), ..., include = chr(), exclude = chr())  
#> select_vars_ : function (vars, args, include = chr(), exclude = chr())  
#> semi_join : function (x, y, by = NULL, copy = FALSE, ...)  
#> setdiff : function (x, y, ...)  
#> setequal : function (x, y, ...)  
#> show_query : function (x, ...)  
#> slice : function (.data, ..., .by = NULL, .preserve = FALSE)  
#> slice_ : function (.data, ..., .dots = list())  
#> slice_head : function (.data, ..., n, prop, by = NULL)  
#> slice_max : function (.data, order_by, ..., n, prop, by = NULL, with_ties = TRUE, na_rm = FALSE)  
#> slice_min : function (.data, order_by, ..., n, prop, by = NULL, with_ties = TRUE, na_rm = FALSE)  
#> slice_sample : function (.data, ..., n, prop, by = NULL, weight_by = NULL, replace = FALSE)  
#> slice_tail : function (.data, ..., n, prop, by = NULL)  
#> sql : function (...)  
#> sql_escape_ident : function (con, x)  
#> sql_escape_string : function (con, x)  
#> sql_join : function (con, x, y, vars, type = "inner", by = NULL, ...)  
#> sql_select : function (con, select, from, where = NULL, group_by = NULL, having = NULL, 
#>     order_by = NULL, limit = NULL, distinct = FALSE, ...)  
#> sql_semi_join : function (con, x, y, anti = FALSE, by = NULL, ...)  
#> sql_set_op : function (con, x, y, method)  
#> sql_subquery : function (con, from, name = random_table_name(), ...)  
#> sql_translate_env : function (con)  
#> src : function (subclass, ...)  
#> src_df : function (pkg = NULL, env = NULL)  
#> src_local : function (tbl, pkg = NULL, env = NULL)  
#> src_mysql : function (dbname, host = NULL, port = 0L, username = "root", password = "", 
#>     ...)  
#> src_postgres : function (dbname = NULL, host = NULL, port = NULL, user = NULL, password = NULL, 
#>     ...)  
#> src_sqlite : function (path, create = FALSE)  
#> src_tbls : function (x, ...)  
#> starts_with : function (match, ignore.case = TRUE, vars = NULL)  
#> summarise : function (.data, ..., .by = NULL, .groups = NULL)  
#> summarise_ : function (.data, ..., .dots = list())  
#> summarise_all : function (.tbl, .funs, ...)  
#> summarise_at : function (.tbl, .vars, .funs, ..., .cols = NULL)  
#> summarise_each : function (tbl, funs, ...)  
#> summarise_each_ : function (tbl, funs, vars)  
#> summarise_if : function (.tbl, .predicate, .funs, ...)  
#> summarize : function (.data, ..., .by = NULL, .groups = NULL)  
#> summarize_ : function (.data, ..., .dots = list())  
#> summarize_all : function (.tbl, .funs, ...)  
#> summarize_at : function (.tbl, .vars, .funs, ..., .cols = NULL)  
#> summarize_each : function (tbl, funs, ...)  
#> summarize_each_ : function (tbl, funs, vars)  
#> summarize_if : function (.tbl, .predicate, .funs, ...)  
#> sym : function (x)  
#> symdiff : function (x, y, ...)  
#> syms : function (x)  
#> tally : function (x, wt = NULL, sort = FALSE, name = NULL)  
#> tally_ : function (x, wt, sort = FALSE)  
#> tbl : function (src, ...)  
#> tbl_df : function (data)  
#> tbl_nongroup_vars : function (x)  
#> tbl_ptype : function (.data)  
#> tbl_vars : function (x)  
#> tibble : function (..., .rows = NULL, .name_repair = c("check_unique", "unique", 
#>     "universal", "minimal"))  
#> top_frac : function (x, n, wt)  
#> top_n : function (x, n, wt)  
#> transmute : function (.data, ...)  
#> transmute_ : function (.data, ..., .dots = list())  
#> transmute_all : function (.tbl, .funs, ...)  
#> transmute_at : function (.tbl, .vars, .funs, ..., .cols = NULL)  
#> transmute_if : function (.tbl, .predicate, .funs, ...)  
#> tribble : function (...)  
#> type_sum : function (x)  
#> ungroup : function (x, ...)  
#> union : function (x, y, ...)  
#> union_all : function (x, y, ...)  
#> validate_grouped_df : function (x, check_bounds = FALSE)  
#> validate_rowwise_df : function (x)  
#> vars : function (...)  
#> where : function (fn)  
#> with_groups : function (.data, .groups, .f, ...)  
#> with_order : function (order_by, fun, x, ...)  
#> wrap_dbplyr_obj : function (obj_name)

Signature

  • What is the name, what are the inputs.

add_count_ : function (x, vars, wt = NULL, sort = FALSE)

Help

  • We can get help about a function by placing a ? in front of of the function

?dplyr::select

Access

We can access the functions that come with a package in 2 ways:

  1. By attaching the package to the working session (library)

  2. By referencing the package directly (dplyr::select())

Data Structures

Storing more then one value requires structure.

Vectors

  • Vectors come in two types: atomic and lists
  • For atomic vectors, all elements must have the same type;
  • For lists, elements can have different types.
  • NULL serves as a generic zero length vector.
  • This diagram - taken from here - illustrates the basic relationships:

. . .

Atomic Vectors: Homogeneous Data

  • A vector containing one type of data is called an atom
    • Atoms can created using the c() (combine) function.
    • The length can be checked with length()

. . .

  • There are four primary types of atomic vectors: logical, integer, double, and character (which contains strings).

  • Collectively integer and double vectors are known as numeric vectors.

  • Complex and raw atomic vectors are rare.

Atoms

# Numeric
(dbl_vec <- c(1.9,2,3.5))
#> [1] 1.9 2.0 3.5

Atoms

# Numeric
(dbl_vec <- c(1.9,2,3.5))
typeof(dbl_vec)
#> [1] 1.9 2.0 3.5
#> [1] "double"

Atoms

# Numeric
(dbl_vec <- c(1.9,2,3.5))
typeof(dbl_vec)
length(dbl_vec)
#> [1] 1.9 2.0 3.5
#> [1] "double"
#> [1] 3

Atoms

# Numeric
(dbl_vec <- c(1.9,2,3.5))
typeof(dbl_vec)
length(dbl_vec)

(int_vec <- c(1L, 17L, 3L))
#> [1] 1.9 2.0 3.5
#> [1] "double"
#> [1] 3
#> [1]  1 17  3

Atoms

# Numeric
(dbl_vec <- c(1.9,2,3.5))
typeof(dbl_vec)
length(dbl_vec)

(int_vec <- c(1L, 17L, 3L))
typeof(int_vec)
#> [1] 1.9 2.0 3.5
#> [1] "double"
#> [1] 3
#> [1]  1 17  3
#> [1] "integer"

Atoms

# Numeric
(dbl_vec <- c(1.9,2,3.5))
typeof(dbl_vec)
length(dbl_vec)

(int_vec <- c(1L, 17L, 3L))
typeof(int_vec)
length(int_vec)
#> [1] 1.9 2.0 3.5
#> [1] "double"
#> [1] 3
#> [1]  1 17  3
#> [1] "integer"
#> [1] 3

Atoms

# Numeric
(dbl_vec <- c(1.9,2,3.5))
typeof(dbl_vec)
length(dbl_vec)

(int_vec <- c(1L, 17L, 3L))
typeof(int_vec)
length(int_vec)

(lg_vec <- c(TRUE, FALSE, F, T))
#> [1] 1.9 2.0 3.5
#> [1] "double"
#> [1] 3
#> [1]  1 17  3
#> [1] "integer"
#> [1] 3
#> [1]  TRUE FALSE FALSE  TRUE

Atoms

# Numeric
(dbl_vec <- c(1.9,2,3.5))
typeof(dbl_vec)
length(dbl_vec)

(int_vec <- c(1L, 17L, 3L))
typeof(int_vec)
length(int_vec)

(lg_vec <- c(TRUE, FALSE, F, T))
typeof(lg_vec)
#> [1] 1.9 2.0 3.5
#> [1] "double"
#> [1] 3
#> [1]  1 17  3
#> [1] "integer"
#> [1] 3
#> [1]  TRUE FALSE FALSE  TRUE
#> [1] "logical"

Atoms

# Numeric
(dbl_vec <- c(1.9,2,3.5))
typeof(dbl_vec)
length(dbl_vec)

(int_vec <- c(1L, 17L, 3L))
typeof(int_vec)
length(int_vec)

(lg_vec <- c(TRUE, FALSE, F, T))
typeof(lg_vec)
length(lg_vec)
#> [1] 1.9 2.0 3.5
#> [1] "double"
#> [1] 3
#> [1]  1 17  3
#> [1] "integer"
#> [1] 3
#> [1]  TRUE FALSE FALSE  TRUE
#> [1] "logical"
#> [1] 4

Atoms

# Numeric
(dbl_vec <- c(1.9,2,3.5))
typeof(dbl_vec)
length(dbl_vec)

(int_vec <- c(1L, 17L, 3L))
typeof(int_vec)
length(int_vec)

(lg_vec <- c(TRUE, FALSE, F, T))
typeof(lg_vec)
length(lg_vec)

(char_vec <- c("ESS", "is", "Great!"))
#> [1] 1.9 2.0 3.5
#> [1] "double"
#> [1] 3
#> [1]  1 17  3
#> [1] "integer"
#> [1] 3
#> [1]  TRUE FALSE FALSE  TRUE
#> [1] "logical"
#> [1] 4
#> [1] "ESS"    "is"     "Great!"

Atoms

# Numeric
(dbl_vec <- c(1.9,2,3.5))
typeof(dbl_vec)
length(dbl_vec)

(int_vec <- c(1L, 17L, 3L))
typeof(int_vec)
length(int_vec)

(lg_vec <- c(TRUE, FALSE, F, T))
typeof(lg_vec)
length(lg_vec)

(char_vec <- c("ESS", "is", "Great!"))
typeof(char_vec)
#> [1] 1.9 2.0 3.5
#> [1] "double"
#> [1] 3
#> [1]  1 17  3
#> [1] "integer"
#> [1] 3
#> [1]  TRUE FALSE FALSE  TRUE
#> [1] "logical"
#> [1] 4
#> [1] "ESS"    "is"     "Great!"
#> [1] "character"

Atoms

# Numeric
(dbl_vec <- c(1.9,2,3.5))
typeof(dbl_vec)
length(dbl_vec)

(int_vec <- c(1L, 17L, 3L))
typeof(int_vec)
length(int_vec)

(lg_vec <- c(TRUE, FALSE, F, T))
typeof(lg_vec)
length(lg_vec)

(char_vec <- c("ESS", "is", "Great!"))
typeof(char_vec)
length(char_vec)
#> [1] 1.9 2.0 3.5
#> [1] "double"
#> [1] 3
#> [1]  1 17  3
#> [1] "integer"
#> [1] 3
#> [1]  TRUE FALSE FALSE  TRUE
#> [1] "logical"
#> [1] 4
#> [1] "ESS"    "is"     "Great!"
#> [1] "character"
#> [1] 3

Missing Values!

  • Missing values need a place holder
  • Missing values are denoted with NA (short for not applicable).
  • Missing values are ‘infectious’: most computations involving a missing value will return another missing value.

Missing Values

(vec <- c(5,6,7,8,NA))
#> [1]  5  6  7  8 NA

Missing Values

(vec <- c(5,6,7,8,NA))
mean(vec)
#> [1]  5  6  7  8 NA
#> [1] NA

Missing Values

(vec <- c(5,6,7,8,NA))
mean(vec)
mean(vec, na.rm = TRUE)
#> [1]  5  6  7  8 NA
#> [1] NA
#> [1] 6.5

Missing Values

(vec <- c(5,6,7,8,NA))
mean(vec)
mean(vec, na.rm = TRUE)

x <- c(NA, 50, NA, 9)
#> [1]  5  6  7  8 NA
#> [1] NA
#> [1] 6.5

Missing Values

(vec <- c(5,6,7,8,NA))
mean(vec)
mean(vec, na.rm = TRUE)

x <- c(NA, 50, NA, 9)
x == NA
#> [1]  5  6  7  8 NA
#> [1] NA
#> [1] 6.5
#> [1] NA NA NA NA

Missing Values

(vec <- c(5,6,7,8,NA))
mean(vec)
mean(vec, na.rm = TRUE)

x <- c(NA, 50, NA, 9)
x == NA

is.na(x)
#> [1]  5  6  7  8 NA
#> [1] NA
#> [1] 6.5
#> [1] NA NA NA NA
#> [1]  TRUE FALSE  TRUE FALSE

Atoms must be of the same type!

Cohersion

  • type is a property of the entire vector

  • When you try and combine different types they will be coerced in a fixed order:

character → double → integer → logical

  • Coercion often happens automatically.

  • You can deliberately coerce by using an as.*() function, like as.logical(), as.integer(), as.double(), or as.character().

  • Failed coercion of strings generates a warning and a missing value

Atoms

c("a", 1)
#> [1] "a" "1"

Atoms

c("a", 1)

c("a", TRUE)
#> [1] "a" "1"
#> [1] "a"    "TRUE"

Atoms

c("a", 1)

c("a", TRUE)

c(4.5, 1L)
#> [1] "a" "1"
#> [1] "a"    "TRUE"
#> [1] 4.5 1.0

Atoms

c("a", 1)

c("a", TRUE)

c(4.5, 1L)

c("1", 18, "GIS")
#> [1] "a" "1"
#> [1] "a"    "TRUE"
#> [1] 4.5 1.0
#> [1] "1"   "18"  "GIS"

Atoms

c("a", 1)

c("a", TRUE)

c(4.5, 1L)

c("1", 18, "GIS")

as.numeric(c("1", 18, "ESS"))
#> [1] "a" "1"
#> [1] "a"    "TRUE"
#> [1] 4.5 1.0
#> [1] "1"   "18"  "GIS"
#> [1]  1 18 NA

Atoms

c("a", 1)

c("a", TRUE)

c(4.5, 1L)

c("1", 18, "GIS")

as.numeric(c("1", 18, "ESS"))

as.logical(c("1", 18, "ESS"))
#> [1] "a" "1"
#> [1] "a"    "TRUE"
#> [1] 4.5 1.0
#> [1] "1"   "18"  "GIS"
#> [1]  1 18 NA
#> [1] NA NA NA

Names

  • In addition to naming the object, you can name elements making them “referenceable”

  • names must be unique, and non-missing

. . .

(x <- c(a = 1, b = 2, c = 3))
#> a b c 
#> 1 2 3

# Using the attribute names()
names(x) <- c("d", "e", "f")
(x)
#> d e f 
#> 1 2 3

# With the function setNames():
(x <- setNames(1:3, c("g", "h", "i")))
#> g h i 
#> 1 2 3

Diminsions

  • You probably noticed that atomic vectors do not include a number of important structures like matrices (2D) or arrays (3D), factors, or date-times.

  • These types extend atomic vectors by adding attributes.

  • Adding a dim attribute to a vector allows it to behave like a 2D matrix or a ^*^D array.

Matrix

  • A matrix is also an 2D atom (row, column)
  • Same data types
  • Same column length

Matrices

(a <- c(1:9))
#> [1] 1 2 3 4 5 6 7 8 9

Matrices

(a <- c(1:9))

# Use matrix
(mat <- matrix(a, nrow = 3))
#> [1] 1 2 3 4 5 6 7 8 9
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9

Matrices

(a <- c(1:9))

# Use matrix
(mat <- matrix(a, nrow = 3))

# Use matrix
(mat2 <- matrix(a, nrow = 3, byrow = TRUE))
#> [1] 1 2 3 4 5 6 7 8 9
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9
#>      [,1] [,2] [,3]
#> [1,]    1    2    3
#> [2,]    4    5    6
#> [3,]    7    8    9

Matrices

(a <- c(1:9))

# Use matrix
(mat <- matrix(a, nrow = 3))

# Use matrix
(mat2 <- matrix(a, nrow = 3, byrow = TRUE))

## dim returns dimensions of an object
dim(mat2)
#> [1] 1 2 3 4 5 6 7 8 9
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9
#>      [,1] [,2] [,3]
#> [1,]    1    2    3
#> [2,]    4    5    6
#> [3,]    7    8    9
#> [1] 3 3

Matrices

(a <- c(1:9))

# Use matrix
(mat <- matrix(a, nrow = 3))

# Use matrix
(mat2 <- matrix(a, nrow = 3, byrow = TRUE))

## dim returns dimensions of an object
dim(mat2)

# set names using colnames
colnames(mat2) <- c("A", "B", "C")
#> [1] 1 2 3 4 5 6 7 8 9
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9
#>      [,1] [,2] [,3]
#> [1,]    1    2    3
#> [2,]    4    5    6
#> [3,]    7    8    9
#> [1] 3 3

Matrices

(a <- c(1:9))

# Use matrix
(mat <- matrix(a, nrow = 3))

# Use matrix
(mat2 <- matrix(a, nrow = 3, byrow = TRUE))

## dim returns dimensions of an object
dim(mat2)

# set names using colnames
colnames(mat2) <- c("A", "B", "C")

mat2
#> [1] 1 2 3 4 5 6 7 8 9
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9
#>      [,1] [,2] [,3]
#> [1,]    1    2    3
#> [2,]    4    5    6
#> [3,]    7    8    9
#> [1] 3 3
#>      A B C
#> [1,] 1 2 3
#> [2,] 4 5 6
#> [3,] 7 8 9

Arrays

  • An array is a 3D atom [row, column, slice]

Arrays

a <-  c(1:12)

Arrays

a <-  c(1:12)

array(a, dim = c(2,3,2))
#> , , 1
#> 
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6
#> 
#> , , 2
#> 
#>      [,1] [,2] [,3]
#> [1,]    7    9   11
#> [2,]    8   10   12

Note on NULL

  • A vector without a dim attribute set is often thought of as 1-dimensional, but actually has NULL dimensions.
dim(a)
#> NULL

Why?

  • Matrices with a single row or single column, or arrays with a single dimension are 1D.

  • They may print similarly, but will behave differently.

  • The differences aren’t too important, but it’s useful to know they exist in case you get strange output from a function (tapply() is a frequent offender).

  • As always, use str() to reveal the differences.

str(c(1:3))                  # 1d vector
#>  int [1:3] 1 2 3

str(matrix(1:3, ncol = 1))   # column vector
#>  int [1:3, 1] 1 2 3

str(matrix(1:3, nrow = 1))   # row vector
#>  int [1, 1:3] 1 2 3

str(array(1:3, 3))           # "array" vector
#>  int [1:3(1d)] 1 2 3

Lists

Lists: Heterogenous Data

  • Lists extend atomic vectors and allow each list element to be any type.
(my_list <- list(
  matrix(1:4, nrow = 2), 
  "ESS is great!", 
  c(TRUE, FALSE, TRUE), 
  c(2.3, 5.9)
))
#> [[1]]
#>      [,1] [,2]
#> [1,]    1    3
#> [2,]    2    4
#> 
#> [[2]]
#> [1] "ESS is great!"
#> 
#> [[3]]
#> [1]  TRUE FALSE  TRUE
#> 
#> [[4]]
#> [1] 2.3 5.9

typeof(my_list)
#> [1] "list"

Lists can be recursive

# list of lists ...
(list_list <- list(list("hi")))
#> [[1]]
#> [[1]][[1]]
#> [1] "hi"

str(list_list)
#> List of 1
#>  $ :List of 1
#>   ..$ : chr "hi"

Data Frames

Structured Lists

  • A data.frame is a data structure built on top of lists
class(data.frame())
#> [1] "data.frame"
typeof(data.frame())
#> [1] "list"
  • a named list of vectors.

  • data.frames are one of the biggest and most important ideas in R

  • Unlike a regular list - the length of each vector in a data.frame must be the same.

  • This gives data frames a rectangular structure and explains why they share the properties of both matrices and lists

A small data.frame

df1 <- data.frame(name = c("George", "Stan", "Carly"),
                  age  = c(75,15,31),
                  retired = c(T,F,F))

A small data.frame

df1 <- data.frame(name = c("George", "Stan", "Carly"),
                  age  = c(75,15,31),
                  retired = c(T,F,F))

typeof(df1)
#> [1] "list"

A small data.frame

df1 <- data.frame(name = c("George", "Stan", "Carly"),
                  age  = c(75,15,31),
                  retired = c(T,F,F))

typeof(df1)
attributes(df1)
#> [1] "list"
#> $names
#> [1] "name"    "age"     "retired"
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#> [1] 1 2 3

A small data.frame

df1 <- data.frame(name = c("George", "Stan", "Carly"),
                  age  = c(75,15,31),
                  retired = c(T,F,F))

typeof(df1)
attributes(df1)
str(df1)
#> [1] "list"
#> $names
#> [1] "name"    "age"     "retired"
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#> [1] 1 2 3
#> 'data.frame':    3 obs. of  3 variables:
#>  $ name   : chr  "George" "Stan" "Carly"
#>  $ age    : num  75 15 31
#>  $ retired: logi  TRUE FALSE FALSE

data.frame example

(num <- c(1,2,3,4))
#> [1] 1 2 3 4

data.frame example

(num <- c(1,2,3,4))

(color <- c("red", "white", "green", NA))
#> [1] 1 2 3 4
#> [1] "red"   "white" "green" NA

data.frame example

(num <- c(1,2,3,4))

(color <- c("red", "white", "green", NA))

(boolean <- c(TRUE,TRUE,TRUE,FALSE))
#> [1] 1 2 3 4
#> [1] "red"   "white" "green" NA
#> [1]  TRUE  TRUE  TRUE FALSE

data.frame example

(num <- c(1,2,3,4))

(color <- c("red", "white", "green", NA))

(boolean <- c(TRUE,TRUE,TRUE,FALSE))

(df <- data.frame(num, color, boolean))
#> [1] 1 2 3 4
#> [1] "red"   "white" "green" NA
#> [1]  TRUE  TRUE  TRUE FALSE
#>   num color boolean
#> 1   1   red    TRUE
#> 2   2 white    TRUE
#> 3   3 green    TRUE
#> 4   4  <NA>   FALSE

Subsetting

  • R’s subsetting operators are fast and powerful.
  • There are 3 subsetting operators:
    • [ –> vectors
    • [[ –> lists
    • $ –> data.frames
  • Subsetting can be combined with assignment.

Subset Atomics

(x <- c(3.4, 7, 18, 9.6))
#> [1]  3.4  7.0 18.0  9.6

Subset Atomics

(x <- c(3.4, 7, 18, 9.6))

x[3]
#> [1]  3.4  7.0 18.0  9.6
#> [1] 18

Subset Atomics

(x <- c(3.4, 7, 18, 9.6))

x[3]

x[c(3,4)]
#> [1]  3.4  7.0 18.0  9.6
#> [1] 18
#> [1] 18.0  9.6

Subset Atomics

(x <- c(3.4, 7, 18, 9.6))

x[3]

x[c(3,4)]

x[-3]
#> [1]  3.4  7.0 18.0  9.6
#> [1] 18
#> [1] 18.0  9.6
#> [1] 3.4 7.0 9.6

Subset Atomics

(x <- c(3.4, 7, 18, 9.6))

x[3]

x[c(3,4)]

x[-3]

x[c(T,T,F,F)]
#> [1]  3.4  7.0 18.0  9.6
#> [1] 18
#> [1] 18.0  9.6
#> [1] 3.4 7.0 9.6
#> [1] 3.4 7.0

Subset Atomics

(x <- c(3.4, 7, 18, 9.6))

x[3]

x[c(3,4)]

x[-3]

x[c(T,T,F,F)]

x <- setNames(x, c('A', 'B','C','D'))
#> [1]  3.4  7.0 18.0  9.6
#> [1] 18
#> [1] 18.0  9.6
#> [1] 3.4 7.0 9.6
#> [1] 3.4 7.0

Subset Atomics

(x <- c(3.4, 7, 18, 9.6))

x[3]

x[c(3,4)]

x[-3]

x[c(T,T,F,F)]

x <- setNames(x, c('A', 'B','C','D'))

x["A"]
#> [1]  3.4  7.0 18.0  9.6
#> [1] 18
#> [1] 18.0  9.6
#> [1] 3.4 7.0 9.6
#> [1] 3.4 7.0
#>   A 
#> 3.4

Subset Atomics

(x <- c(3.4, 7, 18, 9.6))

x[3]

x[c(3,4)]

x[-3]

x[c(T,T,F,F)]

x <- setNames(x, c('A', 'B','C','D'))

x["A"]
x[c("A", "C")]
#> [1]  3.4  7.0 18.0  9.6
#> [1] 18
#> [1] 18.0  9.6
#> [1] 3.4 7.0 9.6
#> [1] 3.4 7.0
#>   A 
#> 3.4
#>    A    C 
#>  3.4 18.0

Subset Atomics

(x <- c(3.4, 7, 18, 9.6))

x[3]

x[c(3,4)]

x[-3]

x[c(T,T,F,F)]

x <- setNames(x, c('A', 'B','C','D'))

x["A"]
x[c("A", "C")]
x[c("A", "A")]
#> [1]  3.4  7.0 18.0  9.6
#> [1] 18
#> [1] 18.0  9.6
#> [1] 3.4 7.0 9.6
#> [1] 3.4 7.0
#>   A 
#> 3.4
#>    A    C 
#>  3.4 18.0
#>   A   A 
#> 3.4 3.4

Subset Matrices

(x <- matrix(1:9, nrow = 3))
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9

Subset Matrices

(x <- matrix(1:9, nrow = 3))

x[3,]
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9
#> [1] 3 6 9

Subset Matrices

(x <- matrix(1:9, nrow = 3))

x[3,]
x[,3]
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9
#> [1] 3 6 9
#> [1] 7 8 9

Subset Matrices

(x <- matrix(1:9, nrow = 3))

x[3,]
x[,3]
x[3,3]
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9
#> [1] 3 6 9
#> [1] 7 8 9
#> [1] 9

Subset Matrices

(x <- matrix(1:9, nrow = 3))

x[3,]
x[,3]
x[3,3]
x[1:2,1:2]
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9
#> [1] 3 6 9
#> [1] 7 8 9
#> [1] 9
#>      [,1] [,2]
#> [1,]    1    4
#> [2,]    2    5

Subset Matrices

(x <- matrix(1:9, nrow = 3))

x[3,]
x[,3]
x[3,3]
x[1:2,1:2]
x[-1,]
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9
#> [1] 3 6 9
#> [1] 7 8 9
#> [1] 9
#>      [,1] [,2]
#> [1,]    1    4
#> [2,]    2    5
#>      [,1] [,2] [,3]
#> [1,]    2    5    8
#> [2,]    3    6    9

Subset Arrays

(x = array(1:12, dim = c(2,2,3)))
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]    1    3
#> [2,]    2    4
#> 
#> , , 2
#> 
#>      [,1] [,2]
#> [1,]    5    7
#> [2,]    6    8
#> 
#> , , 3
#> 
#>      [,1] [,2]
#> [1,]    9   11
#> [2,]   10   12

Subset Arrays

(x = array(1:12, dim = c(2,2,3)))

x[1,,]
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]    1    3
#> [2,]    2    4
#> 
#> , , 2
#> 
#>      [,1] [,2]
#> [1,]    5    7
#> [2,]    6    8
#> 
#> , , 3
#> 
#>      [,1] [,2]
#> [1,]    9   11
#> [2,]   10   12
#>      [,1] [,2] [,3]
#> [1,]    1    5    9
#> [2,]    3    7   11

Subset Arrays

(x = array(1:12, dim = c(2,2,3)))

x[1,,]
x[,1,]
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]    1    3
#> [2,]    2    4
#> 
#> , , 2
#> 
#>      [,1] [,2]
#> [1,]    5    7
#> [2,]    6    8
#> 
#> , , 3
#> 
#>      [,1] [,2]
#> [1,]    9   11
#> [2,]   10   12
#>      [,1] [,2] [,3]
#> [1,]    1    5    9
#> [2,]    3    7   11
#>      [,1] [,2] [,3]
#> [1,]    1    5    9
#> [2,]    2    6   10

Subset Arrays

(x = array(1:12, dim = c(2,2,3)))

x[1,,]
x[,1,]
x[,,1]
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]    1    3
#> [2,]    2    4
#> 
#> , , 2
#> 
#>      [,1] [,2]
#> [1,]    5    7
#> [2,]    6    8
#> 
#> , , 3
#> 
#>      [,1] [,2]
#> [1,]    9   11
#> [2,]   10   12
#>      [,1] [,2] [,3]
#> [1,]    1    5    9
#> [2,]    3    7   11
#>      [,1] [,2] [,3]
#> [1,]    1    5    9
#> [2,]    2    6   10
#>      [,1] [,2]
#> [1,]    1    3
#> [2,]    2    4

Subset Lists

(ll <- list(name = c("George", "Stan", "Carly"),
            age  = c(75,15,31),
            retired = c(T,F,F)))
#> $name
#> [1] "George" "Stan"   "Carly" 
#> 
#> $age
#> [1] 75 15 31
#> 
#> $retired
#> [1]  TRUE FALSE FALSE

Subset Lists

(ll <- list(name = c("George", "Stan", "Carly"),
            age  = c(75,15,31),
            retired = c(T,F,F)))
ll$name
#> $name
#> [1] "George" "Stan"   "Carly" 
#> 
#> $age
#> [1] 75 15 31
#> 
#> $retired
#> [1]  TRUE FALSE FALSE
#> [1] "George" "Stan"   "Carly"

Subset Lists

(ll <- list(name = c("George", "Stan", "Carly"),
            age  = c(75,15,31),
            retired = c(T,F,F)))
ll$name
ll$name[1]
#> $name
#> [1] "George" "Stan"   "Carly" 
#> 
#> $age
#> [1] 75 15 31
#> 
#> $retired
#> [1]  TRUE FALSE FALSE
#> [1] "George" "Stan"   "Carly"
#> [1] "George"

Subset Lists

(ll <- list(name = c("George", "Stan", "Carly"),
            age  = c(75,15,31),
            retired = c(T,F,F)))
ll$name
ll$name[1]

ll[[1]]
#> $name
#> [1] "George" "Stan"   "Carly" 
#> 
#> $age
#> [1] 75 15 31
#> 
#> $retired
#> [1]  TRUE FALSE FALSE
#> [1] "George" "Stan"   "Carly"
#> [1] "George"
#> [1] "George" "Stan"   "Carly"

Subset Lists

(ll <- list(name = c("George", "Stan", "Carly"),
            age  = c(75,15,31),
            retired = c(T,F,F)))
ll$name
ll$name[1]

ll[[1]]
ll[[1]][1]
#> $name
#> [1] "George" "Stan"   "Carly" 
#> 
#> $age
#> [1] 75 15 31
#> 
#> $retired
#> [1]  TRUE FALSE FALSE
#> [1] "George" "Stan"   "Carly"
#> [1] "George"
#> [1] "George" "Stan"   "Carly"
#> [1] "George"

Subset Lists

(ll <- list(name = c("George", "Stan", "Carly"),
            age  = c(75,15,31),
            retired = c(T,F,F)))
ll$name
ll$name[1]

ll[[1]]
ll[[1]][1]

ll[['name']][1]
#> $name
#> [1] "George" "Stan"   "Carly" 
#> 
#> $age
#> [1] 75 15 31
#> 
#> $retired
#> [1]  TRUE FALSE FALSE
#> [1] "George" "Stan"   "Carly"
#> [1] "George"
#> [1] "George" "Stan"   "Carly"
#> [1] "George"
#> [1] "George"

Lists are not Matrices

# The name "Stan"
ll[1,2]
#> Error in ll[1, 2]: incorrect number of dimensions

# Stans Information
ll[2,]
#> Error in ll[2, ]: incorrect number of dimensions

Enter data.frames

(df <- data.frame(name = c("George", "Stan", "Carly"),
                  age  = c(75,15,31),
                  retired = c(T,F,F)))
#>     name age retired
#> 1 George  75    TRUE
#> 2   Stan  15   FALSE
#> 3  Carly  31   FALSE

Enter data.frames

(df <- data.frame(name = c("George", "Stan", "Carly"),
                  age  = c(75,15,31),
                  retired = c(T,F,F)))

# Like a Matrix!
df[1,2]
#>     name age retired
#> 1 George  75    TRUE
#> 2   Stan  15   FALSE
#> 3  Carly  31   FALSE
#> [1] 75

Enter data.frames

(df <- data.frame(name = c("George", "Stan", "Carly"),
                  age  = c(75,15,31),
                  retired = c(T,F,F)))

# Like a Matrix!
df[1,2]
df[2,]
#>     name age retired
#> 1 George  75    TRUE
#> 2   Stan  15   FALSE
#> 3  Carly  31   FALSE
#> [1] 75
#>   name age retired
#> 2 Stan  15   FALSE

Enter data.frames

(df <- data.frame(name = c("George", "Stan", "Carly"),
                  age  = c(75,15,31),
                  retired = c(T,F,F)))

# Like a Matrix!
df[1,2]
df[2,]

# Like a list!
df[[1]][1]
#>     name age retired
#> 1 George  75    TRUE
#> 2   Stan  15   FALSE
#> 3  Carly  31   FALSE
#> [1] 75
#>   name age retired
#> 2 Stan  15   FALSE
#> [1] "George"

Enter data.frames

(df <- data.frame(name = c("George", "Stan", "Carly"),
                  age  = c(75,15,31),
                  retired = c(T,F,F)))

# Like a Matrix!
df[1,2]
df[2,]

# Like a list!
df[[1]][1]
df$name[1]
#>     name age retired
#> 1 George  75    TRUE
#> 2   Stan  15   FALSE
#> 3  Carly  31   FALSE
#> [1] 75
#>   name age retired
#> 2 Stan  15   FALSE
#> [1] "George"
#> [1] "George"

Enter data.frames

(df <- data.frame(name = c("George", "Stan", "Carly"),
                  age  = c(75,15,31),
                  retired = c(T,F,F)))

# Like a Matrix!
df[1,2]
df[2,]

# Like a list!
df[[1]][1]
df$name[1]
#>     name age retired
#> 1 George  75    TRUE
#> 2   Stan  15   FALSE
#> 3  Carly  31   FALSE
#> [1] 75
#>   name age retired
#> 2 Stan  15   FALSE
#> [1] "George"
#> [1] "George"

Real examples

Storm Dataset

  • Many packages come with loaded datasets.
Tip
  • dplyr::storms contains the NOAA Atlantic hurricane database best track data. The data includes the positions and attributes of storms from 1975-2022. Storms from 1979 onward are measured every six hours during the lifetime of the storm. Storms in earlier years have some missing data.

Storms

#preview dataset
head(storms,3)
#> # A tibble: 3 × 13
#>   name   year month   day  hour   lat  long status       category  wind pressure
#>   <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <fct>           <dbl> <int>    <int>
#> 1 Amy    1975     6    27     0  27.5   -79 tropical de…       NA    25     1013
#> 2 Amy    1975     6    27     6  28.5   -79 tropical de…       NA    25     1013
#> 3 Amy    1975     6    27    12  29.5   -79 tropical de…       NA    25     1013
#> # ℹ 2 more variables: tropicalstorm_force_diameter <int>,
#> #   hurricane_force_diameter <int>

Storms

#preview dataset
head(storms,3)
# Get data dimensions
dim(storms)
#> # A tibble: 3 × 13
#>   name   year month   day  hour   lat  long status       category  wind pressure
#>   <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <fct>           <dbl> <int>    <int>
#> 1 Amy    1975     6    27     0  27.5   -79 tropical de…       NA    25     1013
#> 2 Amy    1975     6    27     6  28.5   -79 tropical de…       NA    25     1013
#> 3 Amy    1975     6    27    12  29.5   -79 tropical de…       NA    25     1013
#> # ℹ 2 more variables: tropicalstorm_force_diameter <int>,
#> #   hurricane_force_diameter <int>
#> [1] 19537    13

Storms

#preview dataset
head(storms,3)
# Get data dimensions
dim(storms)
str(storms)
#> # A tibble: 3 × 13
#>   name   year month   day  hour   lat  long status       category  wind pressure
#>   <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <fct>           <dbl> <int>    <int>
#> 1 Amy    1975     6    27     0  27.5   -79 tropical de…       NA    25     1013
#> 2 Amy    1975     6    27     6  28.5   -79 tropical de…       NA    25     1013
#> 3 Amy    1975     6    27    12  29.5   -79 tropical de…       NA    25     1013
#> # ℹ 2 more variables: tropicalstorm_force_diameter <int>,
#> #   hurricane_force_diameter <int>
#> [1] 19537    13
#> tibble [19,537 × 13] (S3: tbl_df/tbl/data.frame)
#>  $ name                        : chr [1:19537] "Amy" "Amy" "Amy" "Amy" ...
#>  $ year                        : num [1:19537] 1975 1975 1975 1975 1975 ...
#>  $ month                       : num [1:19537] 6 6 6 6 6 6 6 6 6 6 ...
#>  $ day                         : int [1:19537] 27 27 27 27 28 28 28 28 29 29 ...
#>  $ hour                        : num [1:19537] 0 6 12 18 0 6 12 18 0 6 ...
#>  $ lat                         : num [1:19537] 27.5 28.5 29.5 30.5 31.5 32.4 33.3 34 34.4 34 ...
#>  $ long                        : num [1:19537] -79 -79 -79 -79 -78.8 -78.7 -78 -77 -75.8 -74.8 ...
#>  $ status                      : Factor w/ 9 levels "disturbance",..: 7 7 7 7 7 7 7 7 8 8 ...
#>  $ category                    : num [1:19537] NA NA NA NA NA NA NA NA NA NA ...
#>  $ wind                        : int [1:19537] 25 25 25 25 25 25 25 30 35 40 ...
#>  $ pressure                    : int [1:19537] 1013 1013 1013 1013 1012 1012 1011 1006 1004 1002 ...
#>  $ tropicalstorm_force_diameter: int [1:19537] NA NA NA NA NA NA NA NA NA NA ...
#>  $ hurricane_force_diameter    : int [1:19537] NA NA NA NA NA NA NA NA NA NA ...

Storms Subset

storms$name[1:5]
#> [1] "Amy" "Amy" "Amy" "Amy" "Amy"

Storms Subset

storms$name[1:5]

storms[[1]][1:5]
#> [1] "Amy" "Amy" "Amy" "Amy" "Amy"
#> [1] "Amy" "Amy" "Amy" "Amy" "Amy"

Storms Subset

storms$name[1:5]

storms[[1]][1:5]

storms[1:5,1:5]
#> [1] "Amy" "Amy" "Amy" "Amy" "Amy"
#> [1] "Amy" "Amy" "Amy" "Amy" "Amy"
#> # A tibble: 5 × 5
#>   name   year month   day  hour
#>   <chr> <dbl> <dbl> <int> <dbl>
#> 1 Amy    1975     6    27     0
#> 2 Amy    1975     6    27     6
#> 3 Amy    1975     6    27    12
#> 4 Amy    1975     6    27    18
#> 5 Amy    1975     6    28     0

Hurricane Ana

ana <- storms[storms$name == "Ana",]

Hurricane Ana

ana <- storms[storms$name == "Ana",]

dim(ana)
#> [1] 189  13

Hurricane Ana

ana <- storms[storms$name == "Ana",]

dim(ana)

unique(ana$year)
#> [1] 189  13
#> [1] 1979 1985 1991 1997 2003 2009 2015 2021

Hurricane Ana

ana_2009 <- ana[ana$year == 2009,]

Hurricane Ana

ana_2009 <- ana[ana$year == 2009,]

{plot(ana_2009$long, ana_2009$lat,
     col = ana_2009$day, pch = 16, cex = 2)
lines(ana_2009$long, ana_2009$lat)}

Raster Data

m <- matrix(1:100, nrow = 10)

Raster Data

m <- matrix(1:100, nrow = 10)
(mr <- terra::rast(m) )
#> class       : SpatRaster 
#> dimensions  : 10, 10, 1  (nrow, ncol, nlyr)
#> resolution  : 1, 1  (x, y)
#> extent      : 0, 10, 0, 10  (xmin, xmax, ymin, ymax)
#> coord. ref. :  
#> source(s)   : memory
#> name        : lyr.1 
#> min value   :     1 
#> max value   :   100

Raster Data

m <- matrix(1:100, nrow = 10)
(mr <- terra::rast(m) )
terra::plot(mr)
#> class       : SpatRaster 
#> dimensions  : 10, 10, 1  (nrow, ncol, nlyr)
#> resolution  : 1, 1  (x, y)
#> extent      : 0, 10, 0, 10  (xmin, xmax, ymin, ymax)
#> coord. ref. :  
#> source(s)   : memory
#> name        : lyr.1 
#> min value   :     1 
#> max value   :   100

🔮 Looking forward 🔮

  • Raster objects objects are built on multi-dimensional arrays, where each layer corresponds to a separate array slice.

  • Like base R matrices, raster objects store values in a column-major order, meaning data is arranged column-wise in memory, optimizing performance for certain matrix operations.

Real Gridded Data:

library(climateR); library(AOI)
(x = getGridMET(AOI = aoi_get(state = "CO"), 
                varname = "tmmx", 
                startDate = "2025-01-01"))
#> $daily_maximum_temperature
#> class       : SpatRaster 
#> dimensions  : 99, 169, 1  (nrow, ncol, nlyr)
#> resolution  : 0.04166669, 0.04166669  (x, y)
#> extent      : -109.0792, -102.0375, 36.9625, 41.0875  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=longlat +ellps=WGS84 +no_defs 
#> source(s)   : memory
#> name        : tmmx_2025-01-01 
#> min value   :           261.9 
#> max value   :           281.6 
#> unit        :               K 
#> time        : 2025-01-01 UTC

plot(x[[1]])

Daily Assignment:

Copy this into a Qmd file, answer the questions posed, and submit the rendered HTML file to Canvas:

# Attach the `palmerspenguins` package

# 1. Examine at the dataset using the ?Help page

# 2. what is the class of the penguins dataset?

# 3. what is the structure of the penguins dataset?

# 4. what are the dimensions of the penguins dataset?

# 5. what are the column names of the penguins dataset?

# 6. what type of data is `flipper_length_mm` and `Island`?

# 7. what is the mean flipper length of the penguins?

# 8. what is the standard deviation of flipper length in the penguins?

# 9. what is the median body mass of the penguins?

# 10. what is the Island of the 100th penguin?