We will be working with the most recent stable versions of R and RStudio, as well as with a number of additional packages. You will need to install R, RStudio, and the necessary packages on your own computer.
1. Install R on your computer
Begin by installing R (http://cloud.r-project.org). Choose the version appropriate for your computing platform:
If you use macOS with an Apple Silicon processor (i.e. an M1, M2, or M3 chip), then install R for macOS’s Apple Silicon build. This version does not work on older, Intel-based Macs.
Once R and RStudio are installed, launch RStudio. Either carefully type in or copy-and-paste the following lines of code at R’s command prompt, located in the RStudio window named “Console”, and then hit return. To copy this chunk of code, mouse over the code click the clipboard icon, , that appears in the top right corner of the chunk.
Packages are loaded into your working environment using the library() function:
Code
## A package containing a dataset rather than functionslibrary(gapminder)gapminder
# A tibble: 1,704 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
4 Afghanistan Asia 1967 34.0 11537966 836.
5 Afghanistan Asia 1972 36.1 13079460 740.
6 Afghanistan Asia 1977 38.4 14880372 786.
7 Afghanistan Asia 1982 39.9 12881816 978.
8 Afghanistan Asia 1987 40.8 13867957 852.
9 Afghanistan Asia 1992 41.7 16317921 649.
10 Afghanistan Asia 1997 41.8 22227415 635.
# ℹ 1,694 more rows
You need only install a package once (and occasionally update it):
Code
## Do at least once for each package. Once done, not needed each time.install.packages("palmerpenguins", repos ="http://cran.rstudio.com")## Needed sometimes, especially after an R major version upgrade.update.packages(repos ="http://cran.rstudio.com")
But you must load the package in each R session before you can access its contents:
Code
## To load a package, usually at the start of your RMarkdown document or script filelibrary(palmerpenguins)penguins
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>
You can “reach in” to an unloaded package and grab a function directly, using <package>::<function>
Code
## A little glimpse of what we'll do soonpenguins |>select(species, body_mass_g, sex) |> gtsummary::tbl_summary(by = species)
Characteristic
Adelie, N = 152
Chinstrap, N = 68
Gentoo, N = 124
body_mass_g, Median (IQR)
3,700 (3,350 – 4,000)
3,700 (3,488 – 3,950)
5,000 (4,700 – 5,500)
Unknown
1
0
1
sex, n (%)
female
73 (50)
34 (50)
58 (49)
male
73 (50)
34 (50)
61 (51)
Unknown
6
0
5
The scope of names
Code
x <-c(1:10)y <-c(90:100)x
[1] 1 2 3 4 5 6 7 8 9 10
Code
y
[1] 90 91 92 93 94 95 96 97 98 99 100
mean()## Error in mean.default() : argument "x" is missing, with no default
Code
mean(x) # argument names are internal to functions
[1] 5.5
Code
mean(x = x)
[1] 5.5
Code
mean(x = y)
[1] 95
Code
x
[1] 1 2 3 4 5 6 7 8 9 10
Code
y
[1] 90 91 92 93 94 95 96 97 98 99 100
Types and Classes
The object inspector in RStudio is your friend.
You can ask an object what it is at the console, too:
Code
class(my_numbers)
[1] "numeric"
Code
typeof(my_numbers)
[1] "double"
Objects can have more than one (nested) class:
Code
summary(my_numbers)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.500 3.000 5.714 4.000 25.000
Code
my_smry <-summary(my_numbers) # remember, outputs can be assigned to a name, creating an objectclass(summary(my_numbers)) # functions can be nested, and are evaluated from the inside out
gapminder # tibbles and data frames can contain vectors of different types
# A tibble: 1,704 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
4 Afghanistan Asia 1967 34.0 11537966 836.
5 Afghanistan Asia 1972 36.1 13079460 740.
6 Afghanistan Asia 1977 38.4 14880372 786.
7 Afghanistan Asia 1982 39.9 12881816 978.
8 Afghanistan Asia 1987 40.8 13867957 852.
9 Afghanistan Asia 1992 41.7 16317921 649.
10 Afghanistan Asia 1997 41.8 22227415 635.
# ℹ 1,694 more rows
Code
class(gapminder)
[1] "tbl_df" "tbl" "data.frame"
Code
typeof(gapminder) # hmm
[1] "list"
Lists can be heterogeneous, in the sense of containing vectors of different types. Underneath, most complex R objects are some kind of list with different components.
A data frame is a list of vectors of the same length, where the vectors can be of different types (e.g. numeric, character, logical, etc)
A tibble is an enhanced data frame
Some classes are versions of others
Base R’s trusty data.frame
Code
library(socviz)
Attaching package: 'socviz'
The following object is masked _by_ '.GlobalEnv':
%nin%
Code
titanic
fate sex n percent
1 perished male 1364 62.0
2 perished female 126 5.7
3 survived male 367 16.7
4 survived female 344 15.6
Code
class(titanic)
[1] "data.frame"
Code
## The `$` idiom picks out a named column here; ## more generally, the named element of a listtitanic$percent
[1] 62.0 5.7 16.7 15.6
The Tidyverse’s enhanced tibble
::::{.smallcode}
Code
## tibbles are build on data frames titanic_tb <-as_tibble(titanic) titanic_tb
# A tibble: 4 × 4
fate sex n percent
<fct> <fct> <dbl> <dbl>
1 perished male 1364 62
2 perished female 126 5.7
3 survived male 367 16.7
4 survived female 344 15.6
Code
class(titanic_tb)
[1] "tbl_df" "tbl" "data.frame"
Recycling rules for vectors
Arithmetic on vectors
In R, all numbers are vectors of different sorts. Even single numbers (“scalars”) are conceptually vectors of length 1.
Arithmetic on vectors (and arrays generally) follows a series of recycling rules that favor ease of expression of vectorized, “elementwise” operations.
See if you can predict what the following operations do:
Warning in my_numbers + three_nums: longer object length is not a multiple of
shorter object length
Code
result4
[1] 2 7 13 2 8 15 26
Note that you get a warning here. It’ll still do it, though! Don’t ignore warnings until you understand what they mean.
Source Code
---title: "Example 01: How R Thinks"---We will be working with the most recent stable versions of R and RStudio, as well as with a number of additional packages. You will need to install R, RStudio, and the necessary packages on your own computer.### 1. Install R on your computerBegin by installing R (<http://cloud.r-project.org>). Choose the version appropriate for your computing platform: - If you use macOS with an Apple Silicon processor (i.e. an M1, M2, or M3 chip), then install [R for macOS's Apple Silicon build](https://cran.r-project.org/bin/macosx/big-sur-arm64/base/R-4.3.2-arm64.pkg). This version does not work on older, Intel-based Macs.- If you use macOS with an Intel processor, then install [R for macOS's Intel build](https://cran.r-project.org/bin/macosx/big-sur-x86_64/base/R-4.3.2-x86_64.pkg). - If you use Microsoft Windows, then install [R for Windows](https://cran.r-project.org/bin/windows/base/R-4.3.2-win.exe).- If you use Linux, [choose a distribution](https://cloud.r-project.org/bin/linux/) and install it.### 2. Install RStudio on your computer- If you use macOS (whether Apple Silicon or Intel), [install this version of RStudio](https://download1.rstudio.org/electron/macos/RStudio-2023.12.0-369.dmg).- If you use Windows, [install this version of RStudio](https://download1.rstudio.org/electron/windows/RStudio-2023.12.0-369.exe).- If you use Linux, [choose your distribution from the download page](https://posit.co/download/rstudio-desktop/).### 3. Installing some additional packages- Once R and RStudio are installed, launch RStudio. Either carefully type in or copy-and-paste the following lines of code at R’s command prompt, located in the RStudio window named “Console”, and then hit return. To copy this chunk of code, mouse over the code click the clipboard icon, {{< fa clipboard >}}, that appears in the top right corner of the chunk.```{r}#| label: setup-1#| eval: FALSE#| echo: TRUEcourse_packages <-c("tidyverse", "babynames", "broom","gapminder", "here", "janitor", "naniar", "palmerpenguins", "skimr", "slider", "socviz","usethis", "visdat","reprex", "remotes")install.packages(course_packages, repos ="http://cran.rstudio.com")data_packages <-c("covdata", "congress", "nycdogs", "ukelection2019", "uscenpops")remotes::install_github(data_packages)```Installing these packages may take a little time. Once you have completed this step, you'll be ready to begin.### 4. Examples from the slides```{r}#| message: false#| warning: falselibrary(tidyverse)```#### Arithmetic:```{r }#| label: "02-about-r-7a"(31 * 12) / 2^4``````{r }#| label: "02-about-r-8a"sqrt(25)``````{r }#| label: "02-about-r-9a"log(100)log10(100)```Logic:```{r }#| label: "02-about-r-10"4 < 104 > 2 & 1 > 0.5 # The "&" means "and"4 < 2 | 1 > 0.5 # The "|" means "or"4 < 2 | 1 < 0.5``````{r }#| label: "02-about-r-11"## A logical test2 == 2 # Write `=` twice``````{r}#| label: "02-about-r-12"#| eval: FALSE## This will cause an error, because R will think you are trying to assign a value2=2## Error in 2 = 2 : invalid (do_set) left-hand side to assignment``````{r }#| label: "02-about-r-13"3 != 7 # Write `!` and then `=` to make `!=````Take care:```{r }#| label: "02-about-r-14"3 < 5 & 7```But now try `3 < 5 & 1`, where your intention is "Three is less than five and also less than one [True or False?]"```{r }#| label: "02-about-r-15"3 < 5 & 1```Instead: ```{r }#| label: "02-about-r-17"3 < 5 & 3 < 1```You have to make your comparisons explicit.#### Floating point mathFloating point arithmetic interacts badly with logical evaluation: ```{r }#| label: "02-about-r-18a"0.6 + 0.2 == 0.8```Now let's try `0.6 + 0.3 == 0.9````{r }#| label: "02-about-r-19"0.6 + 0.3 == 0.9``````{r }#| label: "02-about-r-20"print(.1 + .2)print(.1 + .2, digits=18)all.equal(.1 + .2, 0.3)```#### Objects:```{r }#| label: "02-about-r-21"## We made this beforemy_numbers <- c(1, 1, 2, 4, 1, 3, 1, 5) my_numbersletters # This one is built-inLETTERS # Different!pi # Also built-in```Functions are objects too. ```{r}mean```#### Assignment:```{r }#| label: "02-about-r-26"## name... gets ... this stuffmy_numbers <- c(1, 2, 3, 1, 3, 5, 25, 10)## name ... gets ... the output of the function `c()`your_numbers <- c(5, 31, 71, 1, 3, 21, 6, 52)```Assignment with equals: ```{r }#| label: "02-about-r-27"my_numbers = c(1, 2, 3, 1, 3, 5, 25)my_numbers```On the other hand, [**`=`**]{.fg-pink} has a different meaning when used in functions.I'm going to use [**`<-`**]{.fg-pink} for assignment throughout. Just be consistent either way.#### Special operatorsFor example, matrix multiplication is `%*%````{r}x <-matrix(c(2,3,3,4,1,8), ncol =2)xy <-matrix(c(1,2,3,4), nrow =2)yx %*% y```Why `%*%`? In R the notation `%<SOMETHING>%` is used for some operators, including custom operators. But the thing in between the `% %` can be lots of things. E.g., ```{r}x <- letters[1:10]y <- letters[5:15]xyx %in% y```And we can define our own, too```{r}## Need to refer to the operator in a special way with backticks`%nin%`<-Negate(`%in%`)# Now we have "not in"x %nin% y```#### PackagesPackages are loaded into your working environment using the `library()` function:```{r }#| label: "02-about-r-48"## A package containing a dataset rather than functionslibrary(gapminder)gapminder```You need only _install_ a package once (and occasionally update it):```{r}#| label: "02-about-r-49"#| eval: FALSE## Do at least once for each package. Once done, not needed each time.install.packages("palmerpenguins", repos ="http://cran.rstudio.com")## Needed sometimes, especially after an R major version upgrade.update.packages(repos ="http://cran.rstudio.com")```But you must _load_ the package in each R session before you can access its contents:```{r }#| label: "02-about-r-50"## To load a package, usually at the start of your RMarkdown document or script filelibrary(palmerpenguins)penguins```You can "reach in" to an unloaded package and grab a function directly, using `<package>::<function>````{r}#| label: "02-about-r-51"#| message: FALSE#| echo: FALSEgtsummary::theme_gtsummary_journal(journal ="jama")#gtsummary::theme_gtsummary_compact()``````{r }#| label: "02-about-r-52"## A little glimpse of what we'll do soonpenguins |> select(species, body_mass_g, sex) |> gtsummary::tbl_summary(by = species) ```#### The scope of names```{r }#| label: "02-about-r-53"x <- c(1:10)y <- c(90:100)xy``````{.r}mean()## Error in mean.default() : argument "x" is missing, with no default``````{r }#| label: "02-about-r-54"mean(x) # argument names are internal to functionsmean(x = x)mean(x = y)xy```#### Types and ClassesThe object inspector in RStudio is your friend.You can ask an object what it is at the console, too:```{r }#| label: "02-about-r-55"class(my_numbers)typeof(my_numbers)```Objects can have more than one (nested) class:```{r }#| label: "02-about-r-56"summary(my_numbers)my_smry <- summary(my_numbers) # remember, outputs can be assigned to a name, creating an objectclass(summary(my_numbers)) # functions can be nested, and are evaluated from the inside outclass(my_smry) # equivalent to the previous line``````{r }#| label: "02-about-r-57"typeof(my_smry)attributes(my_smry)## In this case, the functions extract the corresponding attributeclass(my_smry)names(my_smry)```Kinds of vector; and coercion```{r }#| label: "02-about-r-58"my_int <- c(1, 3, 5, 6, 10)is.integer(my_int)is.double(my_int)my_int <- as.integer(my_int)is.integer(my_int)my_chr <- c("Mary", "had", "a", "little", "lamb")is.character(my_chr)my_lgl <- c(TRUE, FALSE, TRUE)is.logical(my_lgl)```#### Factors: ```{r }#| label: "02-about-r-59"## Factors are for storing undordered or ordered categorical variablesx <- factor(c("Yes", "No", "No", "Maybe", "Yes", "Yes", "Yes", "No"))xsummary(x) # Alphabetical order by defaulttypeof(x) # Underneath, a factor is a type of integer ...attributes(x) # ... with labels for its numbers, or "levels" levels(x)is.ordered(x)```Vector types can't be heterogeneous. Objects can be manually or automatically coerced from one class to another. Take care.```{r }#| label: "02-about-r-60"class(my_numbers)my_new_vector <- c(my_numbers, "Apple")my_new_vector # vectors are homogeneous/atomicclass(my_new_vector)``````{r }#| label: "02-about-r-61"my_dbl <- c(2.1, 4.77, 30.111, 3.14519)is.double(my_dbl)my_dbl <- as.integer(my_dbl)my_dbl```#### Lists, data frames, tibbles A table of data is a kind of list```{r }#| label: "02-about-r-62"gapminder # tibbles and data frames can contain vectors of different typesclass(gapminder)typeof(gapminder) # hmm```- Lists _can_ be heterogeneous, in the sense of containing vectors of different types. Underneath, most complex R objects are some kind of list with different components.- A _data frame_ is a list of vectors of the same length, where the vectors can be of different types (e.g. numeric, character, logical, etc)- A _tibble_ is an enhanced data frame Some classes are versions of others- Base R's trusty `data.frame````{r }#| label: "02-about-r-63a"library(socviz)titanicclass(titanic)``````{r }#| label: "02-about-r-64a"## The `$` idiom picks out a named column here; ## more generally, the named element of a listtitanic$percent ```The Tidyverse's enhanced `tibble`::::{.smallcode}```{r }#| label: "02-about-r-65"## tibbles are build on data frames titanic_tb <- as_tibble(titanic) titanic_tbclass(titanic_tb)```#### Recycling rules for vectorsArithmetic on vectorsIn R, all numbers are vectors of different sorts. Even single numbers ("scalars") are conceptually vectors of length 1.Arithmetic on vectors (and arrays generally) follows a series of _recycling rules_ that favor ease of expression of vectorized, "elementwise" operations.See if you can predict what the following operations do: ```{r }#| label: "02-about-r-67"my_numbersresult1 <- my_numbers + 1``````{r }#| label: "02-about-r-68"result1``````{r }#| label: "02-about-r-69"result2 <- my_numbers + my_numbers``````{r }#| label: "02-about-r-70"result2``````{r}#| label: "02-about-r-71a"#| warning: TRUEtwo_nums <-c(5, 10)result3 <- my_numbers + two_nums``````{r }#| label: "02-about-r-72"result3``````{r}#| label: "02-about-r-73a"#| warning: TRUEthree_nums <-c(1, 5, 10)result4 <- my_numbers + three_nums``````{r }#| label: "02-about-r-74"result4```Note that you get a _warning_ here. It'll still do it, though! Don't ignore warnings until you understand what they mean.