Data Wrangling and Data Visualization with R

This is the course website for the Spring 2024 edition of SOCIOL 690S, taught at Duke University by Kieran Healy.

The Preferred Computing Setup for this Course

Overview

This course will teach you the elements of data wrangling and data visualization, mostly in R.

For the data wrangling side, we will not not focus on particular statistical methods or modeling techniques. Rather, we will learn how to accomplish everyday tasks that have to happen before you get to that part. These include topics such as getting your own data into R, rearranging and recoding it, exploring its structure, munging and reshaping tables, and presenting summary tabulations and graphs of this work. We will also examine some more advanced versions of these topics such as managing large datasets, parallelizing tasks, and some of the rudiments of writing functions and maintaining code that any social scientist working with quantitative data should know a bit about.

For the visualization side we we will emphasize the importance of being able to look at and learn from your data yourself and also the best way to present it visually to others. Throughout the course we will emphasize how R and the tidyverse “thinks”. Every dataset is different, especially at stage where it still needs further cleaning or arranging before it can be easily analyzed or effectively presented. This course will teach you the logic and implicit “flow of action” behind the tidyverse’s tools, giving you the ability to apply and extend this way of thinking when working with your own data and its particular challenges.

Throughout the course we will emphasize working through issues and problems you are having with the data analysis you are doing in your own research.

Syllabus & Schedule

  • Consult the syllabus for details about the class.

  • Consult the schedule for weekly topics and readings.