Example 02: Some more introductory R

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gapminder)

The scope of names

x <- c(1:10)
y <- c(90:100)

x
 [1]  1  2  3  4  5  6  7  8  9 10
y
 [1]  90  91  92  93  94  95  96  97  98  99 100
mean()

## Error in mean.default() : argument "x" is missing, with no default
mean(x) # argument names are internal to functions
[1] 5.5
mean(x = x)
[1] 5.5
mean(x = y)
[1] 95
x
 [1]  1  2  3  4  5  6  7  8  9 10
y
 [1]  90  91  92  93  94  95  96  97  98  99 100

Types and Classes

The object inspector in RStudio is your friend.

You can ask an object what it is at the console, too:

## We made this before
my_numbers <- c(1, 1, 2, 4, 1, 3, 1, 5) 

my_numbers
[1] 1 1 2 4 1 3 1 5
class(my_numbers)
[1] "numeric"
typeof(my_numbers)
[1] "double"

Objects can have more than one (nested) class:

summary(my_numbers)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    1.00    1.50    2.25    3.25    5.00 
my_smry <- summary(my_numbers) # remember, outputs can be assigned to a name, creating an object

class(summary(my_numbers)) # functions can be nested, and are evaluated from the inside out
[1] "summaryDefault" "table"         
class(my_smry) # equivalent to the previous line
[1] "summaryDefault" "table"         
typeof(my_smry)
[1] "double"
attributes(my_smry)
$names
[1] "Min."    "1st Qu." "Median"  "Mean"    "3rd Qu." "Max."   

$class
[1] "summaryDefault" "table"         
## In this case, the functions extract the corresponding attribute
class(my_smry)
[1] "summaryDefault" "table"         
names(my_smry)
[1] "Min."    "1st Qu." "Median"  "Mean"    "3rd Qu." "Max."   

Kinds of vector; and coercion

my_int <- c(1, 3, 5, 6, 10)
is.integer(my_int)
[1] FALSE
is.double(my_int)
[1] TRUE
my_int <- as.integer(my_int)
is.integer(my_int)
[1] TRUE
my_chr <- c("Mary", "had", "a", "little", "lamb")
is.character(my_chr)
[1] TRUE
my_lgl <- c(TRUE, FALSE, TRUE)
is.logical(my_lgl)
[1] TRUE

Factors:

## Factors are for storing undordered or ordered categorical variables
x <- factor(c("Yes", "No", "No", "Maybe", "Yes", "Yes", "Yes", "No"))
x
[1] Yes   No    No    Maybe Yes   Yes   Yes   No   
Levels: Maybe No Yes
summary(x) # Alphabetical order by default
Maybe    No   Yes 
    1     3     4 
typeof(x)       # Underneath, a factor is a type of integer ...
[1] "integer"
attributes(x)   # ... with labels for its numbers, or "levels" 
$levels
[1] "Maybe" "No"    "Yes"  

$class
[1] "factor"
levels(x)
[1] "Maybe" "No"    "Yes"  
is.ordered(x)
[1] FALSE

Vector types can’t be heterogeneous. Objects can be manually or automatically coerced from one class to another. Take care.

class(my_numbers)
[1] "numeric"
my_new_vector <- c(my_numbers, "Apple")

my_new_vector # vectors are homogeneous/atomic
[1] "1"     "1"     "2"     "4"     "1"     "3"     "1"     "5"     "Apple"
class(my_new_vector)
[1] "character"
my_dbl <- c(2.1, 4.77, 30.111, 3.14519)
is.double(my_dbl)
[1] TRUE
my_dbl <- as.integer(my_dbl)

my_dbl
[1]  2  4 30  3

Lists, data frames, tibbles

A table of data is a kind of list

gapminder # tibbles and data frames can contain vectors of different types
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows
class(gapminder)
[1] "tbl_df"     "tbl"        "data.frame"
typeof(gapminder) # hmm
[1] "list"
  • Lists can be heterogeneous, in the sense of containing vectors of different types. Underneath, most complex R objects are some kind of list with different components.
  • A data frame is a list of vectors of the same length, where the vectors can be of different types (e.g. numeric, character, logical, etc)
  • A tibble is an enhanced data frame

Some classes are versions of others

  • Base R’s trusty data.frame
library(socviz)
titanic
      fate    sex    n percent
1 perished   male 1364    62.0
2 perished female  126     5.7
3 survived   male  367    16.7
4 survived female  344    15.6
class(titanic)
[1] "data.frame"
## The `$` idiom picks out a named column here; 
## more generally, the named element of a list
titanic$percent  
[1] 62.0  5.7 16.7 15.6

The Tidyverse’s enhanced tibble

## tibbles are build on data frames 
titanic_tb <- as_tibble(titanic) 
titanic_tb
# A tibble: 4 × 4
  fate     sex        n percent
  <fct>    <fct>  <dbl>   <dbl>
1 perished male    1364    62  
2 perished female   126     5.7
3 survived male     367    16.7
4 survived female   344    15.6
class(titanic_tb)
[1] "tbl_df"     "tbl"        "data.frame"

Recycling rules for vectors again

Arithmetic on vectors

In R, all numbers are vectors of different sorts. Even single numbers (“scalars”) are conceptually vectors of length 1.

Arithmetic on vectors (and arrays generally) follows a series of recycling rules that favor ease of expression of vectorized, “elementwise” operations.

See if you can predict what the following operations do:

my_numbers
[1] 1 1 2 4 1 3 1 5
result1 <- my_numbers + 1
result1
[1] 2 2 3 5 2 4 2 6
result2 <- my_numbers + my_numbers
result2
[1]  2  2  4  8  2  6  2 10
two_nums <- c(5, 10)

result3 <- my_numbers + two_nums
result3
[1]  6 11  7 14  6 13  6 15
three_nums <- c(1, 5, 10)

result4 <- my_numbers + three_nums
Warning in my_numbers + three_nums: longer object length is not a multiple of
shorter object length
result4
[1]  2  6 12  5  6 13  2 10

Note that you get a warning here. It’ll still do it, though! Don’t ignore warnings until you understand what they mean.