Example 05: more dplyr

Setup

library(here)      # manage file paths
here() starts at /Users/kjhealy/Documents/courses/socdata.co
library(socviz)    # data and some useful functions
library(tidyverse) # your friend and mine
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

UK Election Data

# remotes::install_github("kjhealy/ukelection2019")
library(ukelection2019)
ukvote2019
# A tibble: 3,320 × 13
   cid     constituency electorate party_name candidate votes vote_share_percent
   <chr>   <chr>             <int> <chr>      <chr>     <int>              <dbl>
 1 W07000… Aberavon          50747 Labour     Stephen … 17008               53.8
 2 W07000… Aberavon          50747 Conservat… Charlott…  6518               20.6
 3 W07000… Aberavon          50747 The Brexi… Glenda D…  3108                9.8
 4 W07000… Aberavon          50747 Plaid Cym… Nigel Hu…  2711                8.6
 5 W07000… Aberavon          50747 Liberal D… Sheila K…  1072                3.4
 6 W07000… Aberavon          50747 Independe… Captain …   731                2.3
 7 W07000… Aberavon          50747 Green      Giorgia …   450                1.4
 8 W07000… Aberconwy         44699 Conservat… Robin Mi… 14687               46.1
 9 W07000… Aberconwy         44699 Labour     Emily Ow… 12653               39.7
10 W07000… Aberconwy         44699 Plaid Cym… Lisa Goo…  2704                8.5
# ℹ 3,310 more rows
# ℹ 6 more variables: vote_share_change <dbl>, total_votes_cast <int>,
#   vrank <int>, turnout <dbl>, fname <chr>, lname <chr>

Use sample_n() to sample n rows of your tibble.

library(ukelection2019)

ukvote2019 |> 
  sample_n(10)
# A tibble: 10 × 13
   cid     constituency electorate party_name candidate votes vote_share_percent
   <chr>   <chr>             <int> <chr>      <chr>     <int>              <dbl>
 1 E14001… Walsall Nor…      67177 Conservat… Eddie Hu… 23334               63.8
 2 S14000… East Kilbri…      81224 Conservat… Gail Mac… 11961               21.2
 3 E14000… Liverpool W…      63458 Liberal P… Mick Coy…   501                1.2
 4 E14000… Blackburn         71228 Conservat… Claire G… 10736               24  
 5 E14000… Chelmsford        80481 Labour     Penny Ri… 10295               18  
 6 E14000… Dagenham & …      71043 Conservat… Damian W… 19175               43.8
 7 W07000… Caerphilly        63166 Plaid Cym… Lindsay …  6424               16  
 8 E14000… Broadland         78151 Liberal D… Ben Good…  9195               16.1
 9 E14000… Harwich & E…      74153 Independe… Richard …   411                0.8
10 W07000… Arfon             42215 Conservat… Gonul Da…  4428               15.2
# ℹ 6 more variables: vote_share_change <dbl>, total_votes_cast <int>,
#   vrank <int>, turnout <dbl>, fname <chr>, lname <chr>

A vector of unique constituency names:

ukvote2019 |> 
  distinct(constituency)
# A tibble: 650 × 1
   constituency                   
   <chr>                          
 1 Aberavon                       
 2 Aberconwy                      
 3 Aberdeen North                 
 4 Aberdeen South                 
 5 Aberdeenshire West & Kincardine
 6 Airdrie & Shotts               
 7 Aldershot                      
 8 Aldridge-Brownhills            
 9 Altrincham & Sale West         
10 Alyn & Deeside                 
# ℹ 640 more rows

Tally them up:

ukvote2019 |> 
  distinct(constituency) |> 
  tally()
# A tibble: 1 × 1
      n
  <int>
1   650
# Base R / non-pipeline version

length(unique(ukvote2019$constituency))
[1] 650

Which parties fielded the most candidates?

ukvote2019 |> 
  count(party_name) |> 
  arrange(desc(n))
# A tibble: 69 × 2
   party_name                     n
   <chr>                      <int>
 1 Conservative                 636
 2 Labour                       631
 3 Liberal Democrat             611
 4 Green                        497
 5 The Brexit Party             275
 6 Independent                  224
 7 Scottish National Party       59
 8 UKIP                          44
 9 Plaid Cymru                   36
10 Christian Peoples Alliance    29
# ℹ 59 more rows

Top 5:

ukvote2019 |> 
  count(party_name) |> 
  slice_max(order_by = n, n = 5)
# A tibble: 5 × 2
  party_name           n
  <chr>            <int>
1 Conservative       636
2 Labour             631
3 Liberal Democrat   611
4 Green              497
5 The Brexit Party   275

Bottom 5:

ukvote2019 |> 
  count(party_name) |> 
  slice_min(order_by = n, n = 5)
# A tibble: 25 × 2
   party_name                              n
   <chr>                               <int>
 1 Ashfield Independents                   1
 2 Best for Luton                          1
 3 Birkenhead Social Justice Party         1
 4 British National Party                  1
 5 Burnley & Padiham Independent Party     1
 6 Church of the Militant Elvis Party      1
 7 Citizens Movement Party UK              1
 8 CumbriaFirst                            1
 9 Heavy Woollen District Independents     1
10 Independent Network                     1
# ℹ 15 more rows

How many constituencies are there again?

ukvote2019 |> 
  count(constituency) 
# A tibble: 650 × 2
   constituency                        n
   <chr>                           <int>
 1 Aberavon                            7
 2 Aberconwy                           4
 3 Aberdeen North                      6
 4 Aberdeen South                      4
 5 Aberdeenshire West & Kincardine     4
 6 Airdrie & Shotts                    5
 7 Aldershot                           4
 8 Aldridge-Brownhills                 5
 9 Altrincham & Sale West              6
10 Alyn & Deeside                      5
# ℹ 640 more rows
ukvote2019 |> 
  distinct(constituency) |> 
  count()
# A tibble: 1 × 1
      n
  <int>
1   650
# Base R style ...
length(unique(ukvote2019$constituency))
[1] 650

Counting Twice Over

What does this mean?

ukvote2019 |> 
  count(constituency) |> 
  count(n)
Storing counts in `nn`, as `n` already present in input
ℹ Use `name = "new_name"` to pick a new name.
# A tibble: 8 × 2
      n    nn
  <int> <int>
1     3    21
2     4   194
3     5   226
4     6   139
5     7    49
6     8    18
7     9     2
8    12     1