library(here) # manage file paths
library(socviz) # data and some useful functions
library(tidyverse) # your friend and mine
Soc 690S: Week 10b
Duke University
March 2025
Split
Apply
Combine
Sometimes we start out in a “split” state, as when we have multiple data files, and we need to read them in and then combine them.
files/examples/
is a folder named congress/
/Users/kjhealy/Documents/courses/socdata.co/files/data/congress
├── 01_79_congress.csv
├── 02_80_congress.csv
├── 03_81_congress.csv
├── 04_82_congress.csv
├── 05_83_congress.csv
├── 06_84_congress.csv
├── 07_85_congress.csv
├── 08_86_congress.csv
├── 09_87_congress.csv
├── 10_88_congress.csv
├── 11_89_congress.csv
├── 12_90_congress.csv
├── 13_91_congress.csv
├── 14_92_congress.csv
├── 15_93_congress.csv
├── 16_94_congress.csv
├── 17_95_congress.csv
├── 18_96_congress.csv
├── 19_97_congress.csv
├── 20_98_congress.csv
├── 21_99_congress.csv
├── 22_100_congress.csv
├── 23_101_congress.csv
├── 24_102_congress.csv
├── 25_103_congress.csv
├── 26_104_congress.csv
├── 27_105_congress.csv
├── 28_106_congress.csv
├── 29_107_congress.csv
├── 30_108_congress.csv
├── 31_109_congress.csv
├── 32_110_congress.csv
├── 33_111_congress.csv
├── 34_112_congress.csv
├── 35_113_congress.csv
├── 36_114_congress.csv
├── 37_115_congress.csv
└── 38_116_congress.csv
Let’s look at one.
read_csv(here("files", "data", "congress", "17_95_congress.csv")) |>
janitor::clean_names() |>
head()
# A tibble: 6 × 25
last first middle suffix nickname born death sex position party state
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Abdnor James <NA> <NA> <NA> 02/1… 11/0… M U.S. Re… Repu… SD
2 Abourezk James George <NA> <NA> 02/2… <NA> M U.S. Se… Demo… SD
3 Adams Brockm… <NA> <NA> Brock 01/1… 09/1… M U.S. Re… Demo… WA
4 Addabbo Joseph Patri… <NA> <NA> 03/1… 04/1… M U.S. Re… Demo… NY
5 Aiken George David <NA> <NA> 08/2… 11/1… M U.S. Se… Repu… VT
6 Akaka Daniel Kahik… <NA> <NA> 09/1… 04/0… M U.S. Re… Demo… HI
# ℹ 14 more variables: district <chr>, start <chr>, end <chr>, religion <chr>,
# race <chr>, educational_attainment <chr>, job_type1 <chr>, job_type2 <chr>,
# job_type3 <chr>, job_type4 <chr>, job_type5 <lgl>, mil1 <chr>, mil2 <chr>,
# mil3 <chr>
We often find ourselves in this situation. We know each file has the same structure, and we would like to use them all at once.
How to read them all in?
One traditional way, which we could do in R, is to write an explicit loop that iterated over a vector of filenames, read each file, and then joined the results together in a tall rectangle.
map()
as a way of iterating without writing loops. You start with a vector of things. You feed it one thing at a time to some function. The function does whatever it does. You get back output that is the same length as your input, and of a specific type.purrr
package provides a big family of mapping functions. One reason there are a lot of them is that purrr
, like the rest of the tidyverse, is picky about data types.map()
, which always returns a list, we also have map_chr()
, map_int()
, map_dbl()
, map_lgl()
and others. They always return the data type indicated by their suffix, or die trying.The simplest cases are not that different from the vectorized arithmetic we’re already familiar with.
R’s vectorized rules add b
to every element of a
. In a sense, the +
operation can be thought of as a function that takes each element of a
and does something with it. In this case “add b
”.
We can make this explicit by writing a function:
We can make this explicit by writing a function:
Now:
Again, R’s vectorized approach means it automatically adds b
to every element of the x we give it.
Some operations can’t directly be vectorized in this way, which is why we need to manually iterate, or will want to write loops.
library(gapminder)
gapminder |>
summarize(country_n = n_distinct(country),
continent_n = n_distinct(continent),
year_n = n_distinct(year),
lifeExp_n = n_distinct(lifeExp),
population_n = n_distinct(population))
# A tibble: 1 × 5
country_n continent_n year_n lifeExp_n population_n
<int> <int> <int> <int> <int>
1 142 5 12 1626 4060
That’s tedious to write! Computers are supposed to allow us to avoid that sort of thing.
So how would we iterate this? What we want is to apply the n_distinct()
function to each column of gapminder
, but in a way that still allows us to use pipelines and so on.
library(gapminder)
gapminder |>
summarize(n_distinct(country),
n_distinct(continent),
n_distinct(year),
n_distinct(lifeExp),
n_distinct(population))
# A tibble: 1 × 5
`n_distinct(country)` `n_distinct(continent)` `n_distinct(year)`
<int> <int> <int>
1 142 5 12
# ℹ 2 more variables: `n_distinct(lifeExp)` <int>,
# `n_distinct(population)` <int>
You’d use across(), like this:
But you could also do this …
Or, in pipeline form:
But we know n_distinct()
should always return an integer. So we use map_int()
instead of the generic map()
.
country continent year lifeExp pop gdpPercap
142 5 12 1626 1704 1704
The thing about the map()
family is that they can deal with all kinds of input types and output types.
filenames <- dir(path = here("files", "data", "congress"),
pattern = "*.csv",
full.names = TRUE)
filenames[1:15] # Just displaying the first 15, to save slide space
[1] "/Users/kjhealy/Documents/courses/socdata.co/files/data/congress/01_79_congress.csv"
[2] "/Users/kjhealy/Documents/courses/socdata.co/files/data/congress/02_80_congress.csv"
[3] "/Users/kjhealy/Documents/courses/socdata.co/files/data/congress/03_81_congress.csv"
[4] "/Users/kjhealy/Documents/courses/socdata.co/files/data/congress/04_82_congress.csv"
[5] "/Users/kjhealy/Documents/courses/socdata.co/files/data/congress/05_83_congress.csv"
[6] "/Users/kjhealy/Documents/courses/socdata.co/files/data/congress/06_84_congress.csv"
[7] "/Users/kjhealy/Documents/courses/socdata.co/files/data/congress/07_85_congress.csv"
[8] "/Users/kjhealy/Documents/courses/socdata.co/files/data/congress/08_86_congress.csv"
[9] "/Users/kjhealy/Documents/courses/socdata.co/files/data/congress/09_87_congress.csv"
[10] "/Users/kjhealy/Documents/courses/socdata.co/files/data/congress/10_88_congress.csv"
[11] "/Users/kjhealy/Documents/courses/socdata.co/files/data/congress/11_89_congress.csv"
[12] "/Users/kjhealy/Documents/courses/socdata.co/files/data/congress/12_90_congress.csv"
[13] "/Users/kjhealy/Documents/courses/socdata.co/files/data/congress/13_91_congress.csv"
[14] "/Users/kjhealy/Documents/courses/socdata.co/files/data/congress/14_92_congress.csv"
[15] "/Users/kjhealy/Documents/courses/socdata.co/files/data/congress/15_93_congress.csv"
read_csv()
… using map()
and binding the resulting list into a tibble.
# A tibble: 20,580 × 26
congress last first middle suffix nickname born death sex position party
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Abern… Thom… Gerst… <NA> <NA> 05/1… 01/2… M U.S. Re… Demo…
2 1 Adams Sher… <NA> <NA> <NA> 01/0… 10/2… M U.S. Re… Repu…
3 1 Aiken Geor… David <NA> <NA> 08/2… 11/1… M U.S. Se… Repu…
4 1 Allen Asa Leona… <NA> <NA> 01/0… 01/0… M U.S. Re… Demo…
5 1 Allen Leo Elwood <NA> <NA> 10/0… 01/1… M U.S. Re… Repu…
6 1 Almond J. Linds… Jr. <NA> 06/1… 04/1… M U.S. Re… Demo…
7 1 Ander… Herm… Carl <NA> <NA> 01/2… 07/2… M U.S. Re… Repu…
8 1 Ander… Clin… Presba <NA> <NA> 10/2… 11/1… M U.S. Re… Demo…
9 1 Ander… John Zuing… <NA> <NA> 03/2… 02/0… M U.S. Re… Repu…
10 1 Andre… Augu… Herman <NA> <NA> 10/1… 01/1… M U.S. Re… Repu…
# ℹ 20,570 more rows
# ℹ 15 more variables: state <chr>, district <chr>, start <chr>, end <chr>,
# religion <chr>, race <chr>, educational_attainment <chr>, job_type1 <chr>,
# job_type2 <chr>, job_type3 <chr>, job_type4 <chr>, job_type5 <chr>,
# mil1 <chr>, mil2 <chr>, mil3 <chr>
read_csv()
can do this directlyIn fact map()
is not required for this particular use:
tmp <- read_csv(filenames, id = "path",
name_repair = janitor::make_clean_names)
tmp |>
mutate(congress = stringr::str_extract(path, "_\\d{2,3}_congress"),
congress = stringr::str_extract(congress, "\\d{2,3}")) |>
relocate(congress)
# A tibble: 20,580 × 27
congress path last first middle suffix nickname born death sex position
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 79 /User… Aber… Thom… Gerst… <NA> <NA> 05/1… 01/2… M U.S. Re…
2 79 /User… Adams Sher… <NA> <NA> <NA> 01/0… 10/2… M U.S. Re…
3 79 /User… Aiken Geor… David <NA> <NA> 08/2… 11/1… M U.S. Se…
4 79 /User… Allen Asa Leona… <NA> <NA> 01/0… 01/0… M U.S. Re…
5 79 /User… Allen Leo Elwood <NA> <NA> 10/0… 01/1… M U.S. Re…
6 79 /User… Almo… J. Linds… Jr. <NA> 06/1… 04/1… M U.S. Re…
7 79 /User… Ande… Herm… Carl <NA> <NA> 01/2… 07/2… M U.S. Re…
8 79 /User… Ande… Clin… Presba <NA> <NA> 10/2… 11/1… M U.S. Re…
9 79 /User… Ande… John Zuing… <NA> <NA> 03/2… 02/0… M U.S. Re…
10 79 /User… Andr… Augu… Herman <NA> <NA> 10/1… 01/1… M U.S. Re…
# ℹ 20,570 more rows
# ℹ 16 more variables: party <chr>, state <chr>, district <chr>, start <chr>,
# end <chr>, religion <chr>, race <chr>, educational_attainment <chr>,
# job_type1 <chr>, job_type2 <chr>, job_type3 <chr>, job_type4 <chr>,
# job_type5 <chr>, mil1 <chr>, mil2 <chr>, mil3 <chr>
map
or walk
function that can repeatedly apply the function with those variables.Mapped iteration is very general, and not just for local files
All counties in New York State for a specific year
# A tibble: 38 × 5
GEOID NAME variable estimate moe
<chr> <chr> <chr> <dbl> <dbl>
1 36001 Albany County, New York B19013_001 50054 2030
2 36005 Bronx County, New York B19013_001 29228 853
3 36007 Broome County, New York B19013_001 36394 2340
4 36009 Cattaraugus County, New York B19013_001 37580 2282
5 36011 Cayuga County, New York B19013_001 42057 2406
6 36013 Chautauqua County, New York B19013_001 35495 2077
7 36015 Chemung County, New York B19013_001 37418 3143
8 36019 Clinton County, New York B19013_001 44757 3500
9 36027 Dutchess County, New York B19013_001 61889 2431
10 36029 Erie County, New York B19013_001 41967 1231
# ℹ 28 more rows
What if we want the results for every available year? First, a handy function: set_names()
By default, set_names()
will label a vector with that vector’s values:
This works with map()
just fine:
# A tibble: 580 × 6
year GEOID NAME variable estimate moe
<int> <chr> <chr> <chr> <dbl> <dbl>
1 1 36001 Albany County, New York B19013_001 50054 2030
2 1 36005 Bronx County, New York B19013_001 29228 853
3 1 36007 Broome County, New York B19013_001 36394 2340
4 1 36009 Cattaraugus County, New York B19013_001 37580 2282
5 1 36011 Cayuga County, New York B19013_001 42057 2406
6 1 36013 Chautauqua County, New York B19013_001 35495 2077
7 1 36015 Chemung County, New York B19013_001 37418 3143
8 1 36019 Clinton County, New York B19013_001 44757 3500
9 1 36027 Dutchess County, New York B19013_001 61889 2431
10 1 36029 Erie County, New York B19013_001 41967 1231
# ℹ 570 more rows
Our id
column tracks the year. But we’d like it to be the year. So, we use set_names()
:
# A tibble: 580 × 6
year GEOID NAME variable estimate moe
<int> <chr> <chr> <chr> <dbl> <dbl>
1 2005 36001 Albany County, New York B19013_001 50054 2030
2 2005 36005 Bronx County, New York B19013_001 29228 853
3 2005 36007 Broome County, New York B19013_001 36394 2340
4 2005 36009 Cattaraugus County, New York B19013_001 37580 2282
5 2005 36011 Cayuga County, New York B19013_001 42057 2406
6 2005 36013 Chautauqua County, New York B19013_001 35495 2077
7 2005 36015 Chemung County, New York B19013_001 37418 3143
8 2005 36019 Clinton County, New York B19013_001 44757 3500
9 2005 36027 Dutchess County, New York B19013_001 61889 2431
10 2005 36029 Erie County, New York B19013_001 41967 1231
# ℹ 570 more rows
Now year
is just the year. The year
column will be created as a character vector, so we converted it back to an integer again at the end.
p_out <- 2005:2019 |>
set_names() |>
map(\(x) get_acs(geography = "county",
variables = "B19013_001",
state = "NY",
survey = "acs1",
year = x)) |>
list_rbind(names_to = "year") |>
mutate(year = as.integer(year)) |>
ggplot(mapping = aes(x = year, y = estimate, group = year)) +
geom_boxplot(fill = "lightblue", alpha = 0.5, outlier.alpha = 0) +
geom_jitter(position = position_jitter(width = 0.1), shape = 1) +
scale_y_continuous(labels = scales::label_dollar()) +
labs(x = "Year", y = "Dollars",
title = "Median Household Income by County in New York State, 2005-2019",
subtitle = "ACS 1-year estimates", caption = "Data: U.S. Census Bureau.")
# This live call to the Census API draws the graph underneath
2005:2019 |>
set_names() |>
map(\(x) get_acs(geography = "county", variables = "B19013_001",
state = "NY", survey = "acs1", year = x)) |>
list_rbind(names_to = "year") |>
mutate(year = as.integer(year)) |>
ggplot(mapping = aes(x = year, y = estimate, group = year)) +
geom_boxplot(fill = "lightblue", alpha = 0.5, outlier.alpha = 0) +
geom_jitter(position = position_jitter(width = 0.1), shape = 1) +
scale_y_continuous(labels = scales::label_dollar()) +
labs(x = "Year", y = "Dollars", title = "Median Household Income by County in New York State, 2005-2019",
subtitle = "ACS 1-year estimates", caption = "Data: U.S. Census Bureau.")
We can’t do anything with this, programatically.
Call:
lm(formula = lifeExp ~ gdpPercap + pop + continent, data = gapminder)
Residuals:
Min 1Q Median 3Q Max
-49.161 -4.486 0.297 5.110 25.175
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.781e+01 3.395e-01 140.819 < 2e-16 ***
gdpPercap 4.495e-04 2.346e-05 19.158 < 2e-16 ***
pop 6.570e-09 1.975e-09 3.326 0.000901 ***
continentAmericas 1.348e+01 6.000e-01 22.458 < 2e-16 ***
continentAsia 8.193e+00 5.712e-01 14.342 < 2e-16 ***
continentEurope 1.747e+01 6.246e-01 27.973 < 2e-16 ***
continentOceania 1.808e+01 1.782e+00 10.146 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.365 on 1697 degrees of freedom
Multiple R-squared: 0.5821, Adjusted R-squared: 0.5806
F-statistic: 393.9 on 6 and 1697 DF, p-value: < 2.2e-16
# A tibble: 7 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 4.78e+1 0.340 141. 0
2 gdpPercap 4.50e-4 0.0000235 19.2 3.24e- 74
3 pop 6.57e-9 0.00000000198 3.33 9.01e- 4
4 continentAmericas 1.35e+1 0.600 22.5 5.19e- 98
5 continentAsia 8.19e+0 0.571 14.3 4.06e- 44
6 continentEurope 1.75e+1 0.625 28.0 6.34e-142
7 continentOceania 1.81e+1 1.78 10.1 1.59e- 23
That’s a lot nicer. Now it’s just a tibble. We know those.
# A tibble: 7 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 4.78e+1 3.40e-1 141. 0 4.71e+1 4.85e+1
2 gdpPercap 4.50e-4 2.35e-5 19.2 3.24e- 74 4.03e-4 4.96e-4
3 pop 6.57e-9 1.98e-9 3.33 9.01e- 4 2.70e-9 1.04e-8
4 continentAmericas 1.35e+1 6.00e-1 22.5 5.19e- 98 1.23e+1 1.47e+1
5 continentAsia 8.19e+0 5.71e-1 14.3 4.06e- 44 7.07e+0 9.31e+0
6 continentEurope 1.75e+1 6.25e-1 28.0 6.34e-142 1.62e+1 1.87e+1
7 continentOceania 1.81e+1 1.78e+0 10.1 1.59e- 23 1.46e+1 2.16e+1
out_conf |>
filter(term %nin% "(Intercept)") |>
mutate(nicelabs = prefix_strip(term, "continent")) |>
select(nicelabs, everything())
# A tibble: 6 × 8
nicelabs term estimate std.error statistic p.value conf.low conf.high
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 gdpPercap gdpPercap 4.50e-4 2.35e-5 19.2 3.24e- 74 4.03e-4 4.96e-4
2 Pop pop 6.57e-9 1.98e-9 3.33 9.01e- 4 2.70e-9 1.04e-8
3 Americas continent… 1.35e+1 6.00e-1 22.5 5.19e- 98 1.23e+1 1.47e+1
4 Asia continent… 8.19e+0 5.71e-1 14.3 4.06e- 44 7.07e+0 9.31e+0
5 Europe continent… 1.75e+1 6.25e-1 28.0 6.34e-142 1.62e+1 1.87e+1
6 Oceania continent… 1.81e+1 1.78e+0 10.1 1.59e- 23 1.46e+1 2.16e+1
Again, single case; then tidily generalize to many cases.
# A tibble: 60 × 3
# Groups: continent, year [60]
continent year data
<fct> <int> <list>
1 Asia 1952 <tibble [33 × 4]>
2 Asia 1957 <tibble [33 × 4]>
3 Asia 1962 <tibble [33 × 4]>
4 Asia 1967 <tibble [33 × 4]>
5 Asia 1972 <tibble [33 × 4]>
6 Asia 1977 <tibble [33 × 4]>
7 Asia 1982 <tibble [33 × 4]>
8 Asia 1987 <tibble [33 × 4]>
9 Asia 1992 <tibble [33 × 4]>
10 Asia 1997 <tibble [33 × 4]>
# ℹ 50 more rows
Think of nesting as a kind of “super-grouping”. Look in the object inspector.
It’s still in there.
# A tibble: 30 × 6
# Groups: continent, year [1]
continent year country lifeExp pop gdpPercap
<fct> <int> <fct> <dbl> <int> <dbl>
1 Europe 1977 Albania 68.9 2509048 3533.
2 Europe 1977 Austria 72.2 7568430 19749.
3 Europe 1977 Belgium 72.8 9821800 19118.
4 Europe 1977 Bosnia and Herzegovina 69.9 4086000 3528.
5 Europe 1977 Bulgaria 70.8 8797022 7612.
6 Europe 1977 Croatia 70.6 4318673 11305.
7 Europe 1977 Czech Republic 70.7 10161915 14800.
8 Europe 1977 Denmark 74.7 5088419 20423.
9 Europe 1977 Finland 72.5 4738902 15605.
10 Europe 1977 France 73.8 53165019 18293.
# ℹ 20 more rows
Here we map()
a custom function to every row in the data
column.
# A tibble: 60 × 4
# Groups: continent, year [60]
continent year data model
<fct> <int> <list> <list>
1 Asia 1952 <tibble [33 × 4]> <lm>
2 Asia 1957 <tibble [33 × 4]> <lm>
3 Asia 1962 <tibble [33 × 4]> <lm>
4 Asia 1967 <tibble [33 × 4]> <lm>
5 Asia 1972 <tibble [33 × 4]> <lm>
6 Asia 1977 <tibble [33 × 4]> <lm>
7 Asia 1982 <tibble [33 × 4]> <lm>
8 Asia 1987 <tibble [33 × 4]> <lm>
9 Asia 1992 <tibble [33 × 4]> <lm>
10 Asia 1997 <tibble [33 × 4]> <lm>
# ℹ 50 more rows
We can tidy the nested models, too.
# A tibble: 48 × 9
# Groups: continent, year [48]
continent year data model term estimate std.error statistic p.value
<fct> <int> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl>
1 Asia 1952 <tibble> <lm> log(gdp… 4.16 1.25 3.33 2.28e-3
2 Asia 1957 <tibble> <lm> log(gdp… 4.17 1.28 3.26 2.71e-3
3 Asia 1962 <tibble> <lm> log(gdp… 4.59 1.24 3.72 7.94e-4
4 Asia 1967 <tibble> <lm> log(gdp… 4.50 1.15 3.90 4.77e-4
5 Asia 1972 <tibble> <lm> log(gdp… 4.44 1.01 4.41 1.16e-4
6 Asia 1977 <tibble> <lm> log(gdp… 4.87 1.03 4.75 4.42e-5
7 Asia 1982 <tibble> <lm> log(gdp… 4.78 0.852 5.61 3.77e-6
8 Asia 1987 <tibble> <lm> log(gdp… 5.17 0.727 7.12 5.31e-8
9 Asia 1992 <tibble> <lm> log(gdp… 5.09 0.649 7.84 7.60e-9
10 Asia 1997 <tibble> <lm> log(gdp… 5.11 0.628 8.15 3.35e-9
# ℹ 38 more rows
# A tibble: 5 × 9
continent year data model term estimate std.error statistic p.value
<fct> <int> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl>
1 Europe 1962 <tibble> <lm> log(gdpP… 5.91 0.853 6.93 1.56e-7
2 Americas 2007 <tibble> <lm> log(gdpP… 4.49 0.752 5.98 4.26e-6
3 Asia 1987 <tibble> <lm> log(gdpP… 5.17 0.727 7.12 5.31e-8
4 Africa 1992 <tibble> <lm> log(gdpP… 7.33 1.10 6.64 2.22e-8
5 Europe 1987 <tibble> <lm> log(gdpP… 4.14 0.752 5.51 6.93e-6
Let’s run a very slightly different model:
# New model
fit_ols2 <- function(df) {
lm(lifeExp ~ log(gdpPercap) + log(pop), data = df)
}
out_tidy <- gapminder |>
group_by(continent, year) |>
nest() |>
mutate(model = map(data, fit_ols2),
tidied = map(model, tidy))
out_tidy
# A tibble: 60 × 5
# Groups: continent, year [60]
continent year data model tidied
<fct> <int> <list> <list> <list>
1 Asia 1952 <tibble [33 × 4]> <lm> <tibble [3 × 5]>
2 Asia 1957 <tibble [33 × 4]> <lm> <tibble [3 × 5]>
3 Asia 1962 <tibble [33 × 4]> <lm> <tibble [3 × 5]>
4 Asia 1967 <tibble [33 × 4]> <lm> <tibble [3 × 5]>
5 Asia 1972 <tibble [33 × 4]> <lm> <tibble [3 × 5]>
6 Asia 1977 <tibble [33 × 4]> <lm> <tibble [3 × 5]>
7 Asia 1982 <tibble [33 × 4]> <lm> <tibble [3 × 5]>
8 Asia 1987 <tibble [33 × 4]> <lm> <tibble [3 × 5]>
9 Asia 1992 <tibble [33 × 4]> <lm> <tibble [3 × 5]>
10 Asia 1997 <tibble [33 × 4]> <lm> <tibble [3 × 5]>
# ℹ 50 more rows
map2()
or pmap()
map2()
. The general case is pmap()
, for passing any number of arguments in a list.out_tidy <- gapminder |> group_by(continent, year) |> nest() |>
mutate(title = paste(continent, year),
model = map(data, fit_ols2),
tidied = map(model, tidy),
ggout = pmap(list(tidied, title),
mod_plot))
out_tidy
# A tibble: 60 × 7
# Groups: continent, year [60]
continent year data title model tidied ggout
<fct> <int> <list> <chr> <list> <list> <list>
1 Asia 1952 <tibble [33 × 4]> Asia 1952 <lm> <tibble [3 × 5]> <gg>
2 Asia 1957 <tibble [33 × 4]> Asia 1957 <lm> <tibble [3 × 5]> <gg>
3 Asia 1962 <tibble [33 × 4]> Asia 1962 <lm> <tibble [3 × 5]> <gg>
4 Asia 1967 <tibble [33 × 4]> Asia 1967 <lm> <tibble [3 × 5]> <gg>
5 Asia 1972 <tibble [33 × 4]> Asia 1972 <lm> <tibble [3 × 5]> <gg>
6 Asia 1977 <tibble [33 × 4]> Asia 1977 <lm> <tibble [3 × 5]> <gg>
7 Asia 1982 <tibble [33 × 4]> Asia 1982 <lm> <tibble [3 × 5]> <gg>
8 Asia 1987 <tibble [33 × 4]> Asia 1987 <lm> <tibble [3 × 5]> <gg>
9 Asia 1992 <tibble [33 × 4]> Asia 1992 <lm> <tibble [3 × 5]> <gg>
10 Asia 1997 <tibble [33 × 4]> Asia 1997 <lm> <tibble [3 × 5]> <gg>
# ℹ 50 more rows
Now, every element of ggout
is …
walk
the plots out to diskwalk
the plots out to diskWe can e.g. walk
the plots out to disk
walk()
is map()
for when you just want a “side-effect” such as printed output. There is also walk2()
and pwalk()
When to use walk
instead of map
? Ask yourself “Do I want anything like a vector or a list or a tibble back—either at the console, or in an object—from this thing I’m about to do?” If the answer is no, use walk
.
figures/
folder [1] "Africa 1952.png" "Africa 1957.png" "Africa 1962.png"
[4] "Africa 1967.png" "Africa 1972.png" "Africa 1977.png"
[7] "Africa 1982.png" "Africa 1987.png" "Africa 1992.png"
[10] "Africa 1997.png" "Africa 2002.png" "Africa 2007.png"
[13] "Americas 1952.png" "Americas 1957.png" "Americas 1962.png"
[16] "Americas 1967.png" "Americas 1972.png" "Americas 1977.png"
[19] "Americas 1982.png" "Americas 1987.png" "Americas 1992.png"
[22] "Americas 1997.png" "Americas 2002.png" "Americas 2007.png"
[25] "Asia 1952.png" "Asia 1957.png" "Asia 1962.png"
[28] "Asia 1967.png" "Asia 1972.png" "Asia 1977.png"
[31] "Asia 1982.png" "Asia 1987.png" "Asia 1992.png"
[34] "Asia 1997.png" "Asia 2002.png" "Asia 2007.png"
[37] "Europe 1952.png" "Europe 1957.png" "Europe 1962.png"
[40] "Europe 1967.png" "Europe 1972.png" "Europe 1977.png"
[43] "Europe 1982.png" "Europe 1987.png" "Europe 1992.png"
[46] "Europe 1997.png" "Europe 2002.png" "Europe 2007.png"
[49] "Oceania 1952.png" "Oceania 1957.png" "Oceania 1962.png"
[52] "Oceania 1967.png" "Oceania 1972.png" "Oceania 1977.png"
[55] "Oceania 1982.png" "Oceania 1987.png" "Oceania 1992.png"
[58] "Oceania 1997.png" "Oceania 2002.png" "Oceania 2007.png"