── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gapminder)
The scope of names
x <-c(1:10)y <-c(90:100)x
[1] 1 2 3 4 5 6 7 8 9 10
y
[1] 90 91 92 93 94 95 96 97 98 99 100
mean()## Error in mean.default() : argument "x" is missing, with no default
mean(x) # argument names are internal to functions
[1] 5.5
mean(x = x)
[1] 5.5
mean(x = y)
[1] 95
x
[1] 1 2 3 4 5 6 7 8 9 10
y
[1] 90 91 92 93 94 95 96 97 98 99 100
Types and Classes
The object inspector in RStudio is your friend.
You can ask an object what it is at the console, too:
## We made this beforemy_numbers <-c(1, 1, 2, 4, 1, 3, 1, 5) my_numbers
[1] 1 1 2 4 1 3 1 5
class(my_numbers)
[1] "numeric"
typeof(my_numbers)
[1] "double"
Objects can have more than one (nested) class:
summary(my_numbers)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 1.00 1.50 2.25 3.25 5.00
my_smry <-summary(my_numbers) # remember, outputs can be assigned to a name, creating an objectclass(summary(my_numbers)) # functions can be nested, and are evaluated from the inside out
gapminder # tibbles and data frames can contain vectors of different types
# A tibble: 1,704 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
4 Afghanistan Asia 1967 34.0 11537966 836.
5 Afghanistan Asia 1972 36.1 13079460 740.
6 Afghanistan Asia 1977 38.4 14880372 786.
7 Afghanistan Asia 1982 39.9 12881816 978.
8 Afghanistan Asia 1987 40.8 13867957 852.
9 Afghanistan Asia 1992 41.7 16317921 649.
10 Afghanistan Asia 1997 41.8 22227415 635.
# ℹ 1,694 more rows
class(gapminder)
[1] "tbl_df" "tbl" "data.frame"
typeof(gapminder) # hmm
[1] "list"
Lists can be heterogeneous, in the sense of containing vectors of different types. Underneath, most complex R objects are some kind of list with different components.
A data frame is a list of vectors of the same length, where the vectors can be of different types (e.g. numeric, character, logical, etc)
A tibble is an enhanced data frame
Some classes are versions of others
Base R’s trusty data.frame
library(socviz)titanic
fate sex n percent
1 perished male 1364 62.0
2 perished female 126 5.7
3 survived male 367 16.7
4 survived female 344 15.6
class(titanic)
[1] "data.frame"
## The `$` idiom picks out a named column here; ## more generally, the named element of a listtitanic$percent
[1] 62.0 5.7 16.7 15.6
The Tidyverse’s enhanced tibble
## tibbles are build on data frames titanic_tb <-as_tibble(titanic) titanic_tb
# A tibble: 4 × 4
fate sex n percent
<fct> <fct> <dbl> <dbl>
1 perished male 1364 62
2 perished female 126 5.7
3 survived male 367 16.7
4 survived female 344 15.6
class(titanic_tb)
[1] "tbl_df" "tbl" "data.frame"
Recycling rules for vectors again
Arithmetic on vectors
In R, all numbers are vectors of different sorts. Even single numbers (“scalars”) are conceptually vectors of length 1.
Arithmetic on vectors (and arrays generally) follows a series of recycling rules that favor ease of expression of vectorized, “elementwise” operations.
See if you can predict what the following operations do:
Warning in my_numbers + three_nums: longer object length is not a multiple of
shorter object length
result4
[1] 2 6 12 5 6 13 2 10
Note that you get a warning here. It’ll still do it, though! Don’t ignore warnings until you understand what they mean.
Source Code
---title: "Example 02: Some more introductory R"---```{r}library(tidyverse)library(gapminder)```## The scope of names```{r }#| label: "02-about-r-53"x <-c(1:10)y <-c(90:100)xy``````{.r}mean()## Error in mean.default() : argument "x" is missing, with no default``````{r }#| label: "02-about-r-54"mean(x) # argument names are internal to functionsmean(x = x)mean(x = y)xy```## Types and ClassesThe object inspector in RStudio is your friend.You can ask an object what it is at the console, too:```{r }#| label: "02-about-r-55"## We made this beforemy_numbers <-c(1, 1, 2, 4, 1, 3, 1, 5) my_numbersclass(my_numbers)typeof(my_numbers)```Objects can have more than one (nested) class:```{r }#| label: "02-about-r-56"summary(my_numbers)my_smry <-summary(my_numbers) # remember, outputs can be assigned to a name, creating an objectclass(summary(my_numbers)) # functions can be nested, and are evaluated from the inside outclass(my_smry) # equivalent to the previous line``````{r }#| label: "02-about-r-57"typeof(my_smry)attributes(my_smry)## In this case, the functions extract the corresponding attributeclass(my_smry)names(my_smry)```Kinds of vector; and coercion```{r }#| label: "02-about-r-58"my_int <-c(1, 3, 5, 6, 10)is.integer(my_int)is.double(my_int)my_int <-as.integer(my_int)is.integer(my_int)my_chr <-c("Mary", "had", "a", "little", "lamb")is.character(my_chr)my_lgl <-c(TRUE, FALSE, TRUE)is.logical(my_lgl)```## Factors: ```{r }#| label: "02-about-r-59"## Factors are for storing undordered or ordered categorical variablesx <-factor(c("Yes", "No", "No", "Maybe", "Yes", "Yes", "Yes", "No"))xsummary(x) # Alphabetical order by defaulttypeof(x) # Underneath, a factor is a type of integer ...attributes(x) # ... with labels for its numbers, or "levels" levels(x)is.ordered(x)```Vector types can't be heterogeneous. Objects can be manually or automatically coerced from one class to another. Take care.```{r }#| label: "02-about-r-60"class(my_numbers)my_new_vector <-c(my_numbers, "Apple")my_new_vector # vectors are homogeneous/atomicclass(my_new_vector)``````{r }#| label: "02-about-r-61"my_dbl <-c(2.1, 4.77, 30.111, 3.14519)is.double(my_dbl)my_dbl <-as.integer(my_dbl)my_dbl```## Lists, data frames, tibbles A table of data is a kind of list```{r }#| label: "02-about-r-62"gapminder # tibbles and data frames can contain vectors of different typesclass(gapminder)typeof(gapminder) # hmm```- Lists _can_ be heterogeneous, in the sense of containing vectors of different types. Underneath, most complex R objects are some kind of list with different components.- A _data frame_ is a list of vectors of the same length, where the vectors can be of different types (e.g. numeric, character, logical, etc)- A _tibble_ is an enhanced data frame Some classes are versions of others- Base R's trusty `data.frame````{r }#| label: "02-about-r-63a"library(socviz)titanicclass(titanic)``````{r }#| label: "02-about-r-64a"## The `$` idiom picks out a named column here; ## more generally, the named element of a listtitanic$percent ```The Tidyverse's enhanced `tibble`::::{.smallcode}```{r }#| label: "02-about-r-65"## tibbles are build on data frames titanic_tb <-as_tibble(titanic) titanic_tbclass(titanic_tb)```## Recycling rules for vectors againArithmetic on vectorsIn R, all numbers are vectors of different sorts. Even single numbers ("scalars") are conceptually vectors of length 1.Arithmetic on vectors (and arrays generally) follows a series of _recycling rules_ that favor ease of expression of vectorized, "elementwise" operations.See if you can predict what the following operations do: ```{r }#| label: "02-about-r-67"my_numbersresult1 <- my_numbers +1``````{r }#| label: "02-about-r-68"result1``````{r }#| label: "02-about-r-69"result2 <- my_numbers + my_numbers``````{r }#| label: "02-about-r-70"result2``````{r}#| label: "02-about-r-71a"#| warning: TRUEtwo_nums <-c(5, 10)result3 <- my_numbers + two_nums``````{r }#| label: "02-about-r-72"result3``````{r}#| label: "02-about-r-73a"#| warning: TRUEthree_nums <-c(1, 5, 10)result4 <- my_numbers + three_nums``````{r }#| label: "02-about-r-74"result4```Note that you get a _warning_ here. It'll still do it, though! Don't ignore warnings until you understand what they mean.