Processing IPUMS-USA data for my income by gender and age RStudio Table Contest entry
Thu, Sep 24, 2020
2-minute read
The following R code processes 2014-18 ACS microdata retrieved from IPUMS-USA into the summary tables used to build my RStudio 2020 Table Contest entry. The output tables can be retrieved at these links:
- https://johndjohnson.info/files/income_by_sex_and_age.csv
- https://johndjohnson.info/files/total_jobs_by_sex.csv
# NOTE: I cannot provide the source data for this table myself.
# You must download it from IPUMS-USA https://usa.ipums.org/usa/index.shtml
if (!require("ipumsr")) stop("Reading IPUMS data into R requires the ipumsr package. It can be installed using the following command: install.packages('ipumsr')")
library(tidyverse)
library(pollster)
library(labelled)
# This data was downloaded from IPUMS USA. It is the 2014-18 ACS microdata.
# The dataset is national, but it is filtered for people who are employed
ddi <- read_ipums_ddi("data/usa_00050.xml")
data <- read_ipums_micro(ddi)
fulltime <- data %>%
filter(UHRSWORK >= 37) %>%
mutate(
age_category = case_when(
AGE < 35 ~ "Under 35",
AGE %in% 35:49 ~ "35 to 50",
AGE >= 50 ~ "50 plus"),
age_category = factor(age_category,
levels = c("Under 35", "35 to 50", "50 plus")))
total.jobs <- topline(fulltime, OCC2010, PERWT)
top.jobs <- total.jobs %>%
slice_max(Frequency, n = 20) %>%
mutate(Response = as.character(Response)) %>%
pull(Response)
# fulltime workers in the top 20 occupations
fulltime.top.jobs <- fulltime %>%
mutate(OCC2010 = to_character(OCC2010)) %>%
filter(OCC2010 %in% top.jobs)
income.by.sex.age <- fulltime.top.jobs %>%
mutate(SEX = to_factor(SEX)) %>%
group_by(OCC2010, SEX, age_category) %>%
summarise(median_wage = Hmisc::wtd.quantile(INCWAGE, weights = PERWT, probs = 0.5)) %>%
ungroup()
income.by.sex.age
jobs.by.sex <- crosstab(fulltime.top.jobs, OCC2010, SEX, PERWT,
unwt_n = T, format = "long") %>%
mutate(age_category = "total")
write_csv(income.by.sex.age, "data/income_by_sex_and_age.csv")
write_csv(jobs.by.sex, "data/total_jobs_by_sex.csv")