Make Some Graphs

Soc 690S: Week 03a

Kieran Healy

Duke University

March 2025

Make Some Graphs

Load our libraries

library(here)      # manage file paths
library(socviz)    # data and some useful functions
library(tidyverse) # your friend and mine

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(gapminder) # some data

Nearly done with the scaffolding

✅ Thought about elements of visualization
✅ Gotten oriented to R and RStudio
✅ Knitted a document
✅ Written a bit of ggplot code

Nearly done with the scaffolding

✅ Thought about elements of visualization
✅ Gotten oriented to R and RStudio
✅ Knitted a document
✅ Written a bit of ggplot code
⬜ Get my data in to R
⬜ Make a plot with it

Feed ggplot tidy data

What is tidy data?

Tidy data

What is tidy data?

Tidy data is in long format

Every column is a single variable

Grolemund & Wickham

Every row is a single observation

Grolemund & Wickham

Every cell is a single value

Grolemund & Wickham

Get your data into long format

Very, very often, the solution to some data-wrangling or data visualization problem in a Tidyverse-focused workflow is:

Get your data into long format

Very, very often, the solution to some data-wrangling or data visualization problem in a Tidyverse-focused workflow is:

First, get the data into long format

Then do the thing you want.

Untidy data exists for good reasons

Storing and printing data in long format entails a lot of repetition:

library(palmerpenguins)
penguins |> 
  group_by(species, island, year) |> 
  summarize(bill = round(mean(bill_length_mm, na.rm = TRUE),2)) |> 
  tinytable::tt()

species	island	year	bill
Adelie	Biscoe	2007	38.32
Adelie	Biscoe	2008	38.70
Adelie	Biscoe	2009	39.69
Adelie	Dream	2007	39.10
Adelie	Dream	2008	38.19
Adelie	Dream	2009	38.15
Adelie	Torgersen	2007	38.80
Adelie	Torgersen	2008	38.77
Adelie	Torgersen	2009	39.31
Chinstrap	Dream	2007	48.72
Chinstrap	Dream	2008	48.70
Chinstrap	Dream	2009	49.05
Gentoo	Biscoe	2007	47.01
Gentoo	Biscoe	2008	46.94
Gentoo	Biscoe	2009	48.50

Untidy data exists for good reasons

A wide format is easier and more efficient to read in print:

penguins |> 
  group_by(species, island, year) |> 
  summarize(bill = round(mean(bill_length_mm, na.rm = TRUE), 2)) |> 
  pivot_wider(names_from = year, values_from = bill) |> 
  tinytable::tt()

species	island	2007	2008	2009
Adelie	Biscoe	38.32	38.70	39.69
Adelie	Dream	39.10	38.19	38.15
Adelie	Torgersen	38.80	38.77	39.31
Chinstrap	Dream	48.72	48.70	49.05
Gentoo	Biscoe	47.01	46.94	48.50

But also for less good reasons

Spot the untidiness

But also for less good reasons

😠 More than one header row
😡 Mixed data types in some columns
💀 Color and typography used to encode variables and their values

Fix it before you import it

Prevention is better than cure!

An excellent article by Karl Broman and Kara Woo:

Broman KW, Woo KH (2018) “Data Organization in Spreadsheets”.” The American Statistician 78:2–10

Data organization in spreadsheets

The most common `tidyr` operation

Pivoting from wide to long:

edu

# A tibble: 366 × 11
   age   sex    year total elem4 elem8   hs3   hs4 coll3 coll4 median
   <chr> <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>  <dbl>
 1 25-34 Male   2016 21845   116   468  1427  6386  6015  7432     NA
 2 25-34 Male   2015 21427   166   488  1584  6198  5920  7071     NA
 3 25-34 Male   2014 21217   151   512  1611  6323  5910  6710     NA
 4 25-34 Male   2013 20816   161   582  1747  6058  5749  6519     NA
 5 25-34 Male   2012 20464   161   579  1707  6127  5619  6270     NA
 6 25-34 Male   2011 20985   190   657  1791  6444  5750  6151     NA
 7 25-34 Male   2010 20689   186   641  1866  6458  5587  5951     NA
 8 25-34 Male   2009 20440   184   695  1806  6495  5508  5752     NA
 9 25-34 Male   2008 20210   172   714  1874  6356  5277  5816     NA
10 25-34 Male   2007 20024   246   757  1930  6361  5137  5593     NA
# ℹ 356 more rows

Here, a “Level of Schooling Attained” variable is spread across the columns, from elem4 to coll4. We need a key column called “education” with the various levels of schooling, and a corresponding value column containing the counts.

Wide to long with `pivot_longer()`

We’re going to put the columns elem4:coll4 into a new column, creating a new categorical measure named education. The numbers currently under each column will become a new value column corresponding to that level of education.

edu |> 
  pivot_longer(elem4:coll4, names_to = "education")

# A tibble: 2,196 × 7
   age   sex    year total median education value
   <chr> <chr> <int> <int>  <dbl> <chr>     <dbl>
 1 25-34 Male   2016 21845     NA elem4       116
 2 25-34 Male   2016 21845     NA elem8       468
 3 25-34 Male   2016 21845     NA hs3        1427
 4 25-34 Male   2016 21845     NA hs4        6386
 5 25-34 Male   2016 21845     NA coll3      6015
 6 25-34 Male   2016 21845     NA coll4      7432
 7 25-34 Male   2015 21427     NA elem4       166
 8 25-34 Male   2015 21427     NA elem8       488
 9 25-34 Male   2015 21427     NA hs3        1584
10 25-34 Male   2015 21427     NA hs4        6198
# ℹ 2,186 more rows

Wide to long with `pivot_longer()`

We can name the value column to whatever we like. Here it’s a number of people.

edu |> 
  pivot_longer(elem4:coll4, 
               names_to = "education", 
               values_to = "n")

# A tibble: 2,196 × 7
   age   sex    year total median education     n
   <chr> <chr> <int> <int>  <dbl> <chr>     <dbl>
 1 25-34 Male   2016 21845     NA elem4       116
 2 25-34 Male   2016 21845     NA elem8       468
 3 25-34 Male   2016 21845     NA hs3        1427
 4 25-34 Male   2016 21845     NA hs4        6386
 5 25-34 Male   2016 21845     NA coll3      6015
 6 25-34 Male   2016 21845     NA coll4      7432
 7 25-34 Male   2015 21427     NA elem4       166
 8 25-34 Male   2015 21427     NA elem8       488
 9 25-34 Male   2015 21427     NA hs3        1584
10 25-34 Male   2015 21427     NA hs4        6198
# ℹ 2,186 more rows

How to get your own data into R

Reading in CSV files

Base R has read.csv()
Corresponding tidyverse “underscored” version: read_csv().
It is pickier and more talkative than the Base R version. Use it instead.

Where’s my data? Using `here()`

If we’re loading a file, it’s coming from somewhere.
If it’s a file on our hard drive somewhere, we will need to interact with the file system. We should try to do this in a way that avoids absolute file paths.

# This is not portable!
df <- read_csv("/Users/kjhealy/Documents/data/misc/project/data/mydata.csv")

We should also do it in a way that is platform independent.
This makes it easier to share your work, move it around, etc. Projects should be self-contained.

Where’s my data? Using `here()`

The here package, and here() function builds paths relative to the top level of your R project.

here() # this path will be different for you

[1] "/Users/kjhealy/Documents/courses/socdata.co"

Where’s the data? Using `here()`

This seminar’s files all live in an RStudio project. It looks like this:

/Users/kjhealy/Documents/courses/socdata.co
├── R
├── README.qmd
├── _extensions
├── _freeze
├── _quarto.yml
├── _site
├── _targets
├── _targets.R
├── _variables.yml
├── about.qmd
├── assets
├── assignment
├── content
├── data
├── deploy.sh
├── example
├── files
├── fonts
├── images
├── index.qmd
├── projects
├── renv
├── renv.lock
├── schedule
├── slides
├── socdata.co.Rproj
├── staging
└── syllabus

I want to load files from the data folder, but I also want you to be able to load them. I’m writing this from somewhere deep in the slides folder, but you won’t be there. Also, I’m on a Mac, but you may not be.

Where’s the data? Using `here()`

So:

## Load the file relative to the path from the top of the project, without separators, etc
organs <- read_csv(file = here("files", "data", "organdonation.csv"))

Where’s the data? Using `here()`

organs

# A tibble: 238 × 21
   country  year donors   pop pop.dens   gdp gdp.lag health health.lag pubhealth
   <chr>   <dbl>  <dbl> <dbl>    <dbl> <dbl>   <dbl>  <dbl>      <dbl>     <dbl>
 1 Austra…    NA  NA    17065    0.220 16774   16591   1300       1224       4.8
 2 Austra…  1991  12.1  17284    0.223 17171   16774   1379       1300       5.4
 3 Austra…  1992  12.4  17495    0.226 17914   17171   1455       1379       5.4
 4 Austra…  1993  12.5  17667    0.228 18883   17914   1540       1455       5.4
 5 Austra…  1994  10.2  17855    0.231 19849   18883   1626       1540       5.4
 6 Austra…  1995  10.2  18072    0.233 21079   19849   1737       1626       5.5
 7 Austra…  1996  10.6  18311    0.237 21923   21079   1846       1737       5.6
 8 Austra…  1997  10.3  18518    0.239 22961   21923   1948       1846       5.7
 9 Austra…  1998  10.5  18711    0.242 24148   22961   2077       1948       5.9
10 Austra…  1999   8.67 18926    0.244 25445   24148   2231       2077       6.1
# ℹ 228 more rows
# ℹ 11 more variables: roads <dbl>, cerebvas <dbl>, assault <dbl>,
#   external <dbl>, txp.pop <dbl>, world <chr>, opt <chr>, consent.law <chr>,
#   consent.practice <chr>, consistent <chr>, ccode <chr>

And there it is.

`read_csv()` has variants

read_csv() Field separator is a comma: ,

organs <- read_csv(file = here("files", "data", "organdonation.csv"))

read_csv2() Field separator is a semicolon: ;

# Example only
my_data <- read_csv2(file = here("data", "my_euro_file.csv))

Both are special cases of read_delim()

Other species are also catered to

read_tsv() Tab separated.
read_fwf() Fixed-width files.
read_log() Log files (i.e. computer log files).
read_lines() Just read in lines, without trying to parse them.

Also often useful …

read_table()

For data that’s separated by one (or more) columns of space.

And for foreign file formats …

The haven package provides

read_dta() Stata
read_spss() SPSS
read_sas() SAS
read_xpt() SAS Transport

Make these functions available with library(haven)

The readxl package provides

read_xlsx() Modern Excel files
read_xls() Older Excel files
Plus a suite of functions for dealing with e.g. tabbed spreadsheets

You can read files remotely, too

You can give these functions local files, or they can also be pointed at URLs.
Compressed files (.zip, .tar.gz) will be automatically uncompressed.
(Be careful what you download from remote locations!)

organ_remote <- read_csv("https://kjhealy.co/organdonation.csv")
organ_remote

# A tibble: 238 × 21
   country  year donors   pop pop.dens   gdp gdp.lag health health.lag pubhealth
   <chr>   <dbl>  <dbl> <dbl>    <dbl> <dbl>   <dbl>  <dbl>      <dbl>     <dbl>
 1 Austra…    NA  NA    17065    0.220 16774   16591   1300       1224       4.8
 2 Austra…  1991  12.1  17284    0.223 17171   16774   1379       1300       5.4
 3 Austra…  1992  12.4  17495    0.226 17914   17171   1455       1379       5.4
 4 Austra…  1993  12.5  17667    0.228 18883   17914   1540       1455       5.4
 5 Austra…  1994  10.2  17855    0.231 19849   18883   1626       1540       5.4
 6 Austra…  1995  10.2  18072    0.233 21079   19849   1737       1626       5.5
 7 Austra…  1996  10.6  18311    0.237 21923   21079   1846       1737       5.6
 8 Austra…  1997  10.3  18518    0.239 22961   21923   1948       1846       5.7
 9 Austra…  1998  10.5  18711    0.242 24148   22961   2077       1948       5.9
10 Austra…  1999   8.67 18926    0.244 25445   24148   2231       2077       6.1
# ℹ 228 more rows
# ℹ 11 more variables: roads <dbl>, cerebvas <dbl>, assault <dbl>,
#   external <dbl>, txp.pop <dbl>, world <chr>, opt <chr>, consent.law <chr>,
#   consent.practice <chr>, consistent <chr>, ccode <chr>

You can read files remotely, too

Unfortunately readxl does not support getting data from remote URLs.
We can work around this. But it is annoying because …

Wide Remote Data: Topical Edition

https://kjhealy.co/sd/enduse_imports.xlsx

get_excel_file <- function(url) {
  httr::GET(url, httr::write_disk(tf <- tempfile(fileext = ".xlsx")))
  readxl::read_xlsx(tf) 
}

enduse <- get_excel_file("https://kjhealy.co/sd/enduse_imports.xlsx")
enduse

# A tibble: 22,042 × 14
   CTY_CODE CTY_DESC    END_USE COMM_DESC    value_14 value_15 value_16 value_17
   <chr>    <chr>       <chr>   <chr>           <dbl>    <dbl>    <dbl>    <dbl>
 1 0000     World Total 00000   Green coffee  5.23e 9  5.12e 9  4.79e 9  5.18e 9
 2 0000     World Total 00010   Cocoa beans   1.31e 9  1.43e 9  1.29e 9  1.19e 9
 3 0000     World Total 00020   Cane and be…  1.60e 9  1.74e 9  1.77e 9  1.66e 9
 4 0000     World Total 00100   Meat produc…  1.21e10  1.28e10  1.07e10  1.10e10
 5 0000     World Total 00110   Dairy produ…  1.95e 9  2.14e 9  2.02e 9  1.95e 9
 6 0000     World Total 00120   Fruits, fro…  1.46e10  1.58e10  1.71e10  1.83e10
 7 0000     World Total 00130   Vegetables    1.09e10  1.13e10  1.25e10  1.28e10
 8 0000     World Total 00140   Nuts          2.39e 9  2.80e 9  2.90e 9  3.33e 9
 9 0000     World Total 00150   Food oils, …  7.00e 9  6.05e 9  6.22e 9  6.85e 9
10 0000     World Total 00160   Bakery prod…  9.34e 9  9.65e 9  1.07e10  1.11e10
# ℹ 22,032 more rows
# ℹ 6 more variables: value_18 <dbl>, value_19 <dbl>, value_20 <dbl>,
#   value_21 <dbl>, value_22 <dbl>, value_23 <dbl>

Let’s transform it to long format.

A Plot’s Components

What we need our code to make

Data represented by visual elements;
like position, length, color, and size;
Each measured on some scale;
Each scale with a labeled guide;
With the plot itself also titled and labeled.

How does
ggplot
do this?

`ggplot`’s flow of action

Here’s the whole thing, start to finish

Flow of action

We’ll go through it step by step

Flow of action

`ggplot`’s flow of action

What we start with

`ggplot`’s flow of action

Where we’re going

`ggplot`’s flow of action

Core steps

`ggplot`’s flow of action

Optional steps

`ggplot`’s flow of action: required

Tidy data

`ggplot`’s flow of action: required

Aesthetic mappings

`ggplot`’s flow of action: required

Geom

Let’s go piece by piece

Start with the data

gapminder

# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows

dim(gapminder)

[1] 1704    6

Create a plot object

Data is the gapminder tibble.

p <- ggplot(data = gapminder)

Map variables to aesthetics

Tell ggplot the variables you want represented by visual elements on the plot

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y = lifeExp))

Map variables to aesthetics

The mapping = aes(...) call links variables to things you will see on the plot.

x and y represent the quantities determining position on the x and y axes.

Other aesthetic mappings can include, e.g., color, shape, size, and fill.

Mappings do not directly specify the particular, e.g., colors, shapes, or line styles that will appear on the plot. Rather, they establish which variables in the data will be represented by which visible elements on the plot.

`p` has data and mappings but no geom

This empty plot has no geoms.

Add a geom

p + geom_point()

A scatterplot of Life Expectancy vs GDP

Try a different geom

p + geom_smooth()

A scatterplot of Life Expectancy vs GDP

Build your plots layer by layer

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y=lifeExp))
p + geom_smooth()

Life Expectancy vs GDP, using a smoother.

This process is additive

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y=lifeExp))

This process is additive

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y=lifeExp))
p + geom_smooth()

This process is additive

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y=lifeExp))
p + geom_smooth() +
  geom_point()

Every `geom` is a function

Functions take arguments

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y = lifeExp))
p + geom_point() + 
  geom_smooth(method = "lm")

Keep Layering

 p <- ggplot(data = gapminder,
             mapping = aes(x = gdpPercap,
                           y=lifeExp))

Keep Layering

 p <- ggplot(data = gapminder,
             mapping = aes(x = gdpPercap,
                           y=lifeExp))
p + geom_point()

Keep Layering

 p <- ggplot(data = gapminder,
             mapping = aes(x = gdpPercap,
                           y=lifeExp))
p + geom_point() +
    geom_smooth(method = "lm")

Keep Layering

 p <- ggplot(data = gapminder,
             mapping = aes(x = gdpPercap,
                           y=lifeExp))
p + geom_point() +
    geom_smooth(method = "lm") +
    scale_x_log10()

Fix the labels

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y=lifeExp))

Fix the labels

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y=lifeExp))
p + geom_point()

Fix the labels

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y=lifeExp))
p + geom_point() +
    geom_smooth(method = "lm")

Fix the labels

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y=lifeExp))
p + geom_point() +
    geom_smooth(method = "lm") +
    scale_x_log10(labels = scales::label_dollar())

Add labels, title, and caption

p <- ggplot(data = gapminder, 
            mapping = aes(x = gdpPercap, 
                          y = lifeExp))
p + geom_point() + 
  geom_smooth(method = "lm") +
    scale_x_log10(labels = scales::label_dollar()) +
    labs(x = "GDP Per Capita", 
         y = "Life Expectancy in Years",
         title = "Economic Growth and Life Expectancy",
         subtitle = "Data points are country-years",
         caption = "Source: Gapminder.")

Mapping vs Setting
your plot’s aesthetics

“Can I change the color of the points?”

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y = lifeExp,
                          color = "purple"))

## Put in an object for convenience
p_out <- p + geom_point() +
    geom_smooth(method = "loess") +
    scale_x_log10()

What has gone wrong here?

p_out

Try again

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y = lifeExp))

## Put in an object for convenience
p_out <- p + geom_point(color = "purple") +
    geom_smooth(method = "loess") +
    scale_x_log10()

Try again

p_out

Geoms can take many arguments

Here we set color, size, and alpha. Meanwhile x and y are mapped.
We also give non-default values to some other arguments

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y = lifeExp)) 
p_out <- p + geom_point(alpha = 0.3) +
    geom_smooth(color = "orange", 
                se = FALSE, 
                linewidth = 8, 
                method = "lm") +
    scale_x_log10()

Geoms can take many arguments

p_out

`alpha` for overplotting

p <- ggplot(data = gapminder, 
            mapping = aes(x = gdpPercap, 
                          y = lifeExp))
p + geom_point(alpha = 0.3) + 
  geom_smooth(method = "lm") +
    scale_x_log10(labels = scales::label_dollar()) +
    labs(x = "GDP Per Capita", 
         y = "Life Expectancy in Years",
         title = "Economic Growth and Life Expectancy",
         subtitle = "Data points are country-years",
         caption = "Source: Gapminder.")

Map or Set values
per geom

Geoms can take their own mappings

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y = lifeExp,
                          color = continent,
                          fill = continent))

Geoms can take their own mappings

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y = lifeExp,
                          color = continent,
                          fill = continent))
p + geom_point()

Geoms can take their own mappings

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y = lifeExp,
                          color = continent,
                          fill = continent))
p + geom_point() +
    geom_smooth(method = "loess")

Geoms can take their own mappings

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y = lifeExp,
                          color = continent,
                          fill = continent))
p + geom_point() +
    geom_smooth(method = "loess") +
    scale_x_log10(labels = scales::label_dollar())

Geoms can take their own mappings

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y = lifeExp))

Geoms can take their own mappings

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y = lifeExp))
p + geom_point(mapping = aes(color = continent))

Geoms can take their own mappings

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y = lifeExp))
p + geom_point(mapping = aes(color = continent)) +
    geom_smooth(method = "loess")

Geoms can take their own mappings

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y = lifeExp))
p + geom_point(mapping = aes(color = continent)) +
    geom_smooth(method = "loess") +
    scale_x_log10(labels = scales::label_dollar())

Geoms can take their own mappings

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y = lifeExp))
p + geom_point(mapping = aes(color = continent)) +
    geom_smooth(method = "loess") +
    scale_x_log10(labels = scales::label_dollar())

Pay attention to which scales and guides are drawn, and why

Guides and scales reflect `aes()` mappings

mapping = aes(color = continent, fill = continent)

Guides and scales reflect `aes()` mappings

mapping = aes(color = continent, fill = continent)

mapping = aes(color = continent)

Remember: Every mapped variable has a scale

Saving your work

Use `ggsave()`

## Save the most recent plot
ggsave(filename = "figures/my_figure.png")

## Use here() for more robust file paths
ggsave(filename = here("figures", "my_figure.png"))

## A plot object
p_out <- p + geom_point(mapping = aes(color = log(pop))) +
    scale_x_log10()

ggsave(filename = here("figures", "lifexp_vs_gdp_gradient.pdf"), 
       plot = p_out)

ggsave(here("figures", "lifexp_vs_gdp_gradient.png"), 
       plot = p_out, 
       width = 8, 
       height = 5)

In code chunks

Set options in any chunk:

RMarkdown Style


```{r, fig.height=8, fig.width=5, fig.show="hold", fig.cap="A caption"}
gapminder |> 
  ggplot(mapping = aes(x = gdpPercap, y = lifeExp)) + 
  geom_point()
```

Quarto Style


```{r}
#| fig.height=8 
#| fig.width=5
#| fig.show: "hold" 
#| fig.cap="A caption"

gapminder |> 
  ggplot(mapping = aes(x = gdpPercap, y = lifeExp)) + 
  geom_point()

```

Or for the whole document:

knitr::opts_chunk$set(warning = TRUE,
                        message = TRUE,
                        fig.retina = 3,
                        fig.align = "center",
                        fig.asp = 0.7,
                        dev = c("png", "pdf"))

Getting Help

How to read an R Help page

Make Some Graphs

Make Some Graphs

Load our libraries

Nearly done with the scaffolding

Nearly done with the scaffolding

Feed ggplot tidy data

What is tidy data?

What is tidy data?

Every column is a single variable

Every row is a single observation

Every cell is a single value

Get your data into long format

Get your data into long format

Untidy data exists for good reasons

Untidy data exists for good reasons

But also for less good reasons

But also for less good reasons

Fix it before you import it

The most common tidyr operation

Wide to long with pivot_longer()

Wide to long with pivot_longer()

How to get your own data into R

Reading in CSV files

Where’s my data? Using here()

Where’s my data? Using here()

Where’s the data? Using here()

Where’s the data? Using here()

Where’s the data? Using here()

read_csv() has variants

Other species are also catered to

Also often useful …

And for foreign file formats …

You can read files remotely, too

You can read files remotely, too

Wide Remote Data: Topical Edition

A Plot’s Components

What we need our code to make

ggplot’s flow of action

Here’s the whole thing, start to finish

We’ll go through it step by step

ggplot’s flow of action

ggplot’s flow of action

ggplot’s flow of action

ggplot’s flow of action

ggplot’s flow of action: required

ggplot’s flow of action: required

ggplot’s flow of action: required

Start with the data

Create a plot object

Map variables to aesthetics

Map variables to aesthetics

p has data and mappings but no geom

Add a geom

Try a different geom

Build your plots layer by layer

This process is additive

This process is additive

This process is additive

Every geom is a function

Keep Layering

Keep Layering

Keep Layering

Keep Layering

Fix the labels

Fix the labels

Fix the labels

Fix the labels

Add labels, title, and caption

“Can I change the color of the points?”

What has gone wrong here?

Try again

Try again

Geoms can take many arguments

Geoms can take many arguments

alpha for overplotting

Geoms can take their own mappings

Geoms can take their own mappings

Geoms can take their own mappings

Geoms can take their own mappings

Geoms can take their own mappings

The most common `tidyr` operation

Wide to long with `pivot_longer()`

Wide to long with `pivot_longer()`

Where’s my data? Using `here()`

Where’s my data? Using `here()`

Where’s the data? Using `here()`

Where’s the data? Using `here()`

Where’s the data? Using `here()`

`read_csv()` has variants

`ggplot`’s flow of action

`ggplot`’s flow of action

`ggplot`’s flow of action

`ggplot`’s flow of action

`ggplot`’s flow of action

`ggplot`’s flow of action: required

`ggplot`’s flow of action: required

`ggplot`’s flow of action: required

`p` has data and mappings but no geom

Every `geom` is a function

`alpha` for overplotting

Guides and scales reflect `aes()` mappings

Guides and scales reflect `aes()` mappings

Use `ggsave()`