Some helpful code for Report 2.
Imagine if we were interested in how attitudes about civil unrest in the 1970s influenced political views. To investigate, we could use the American National Election Study Timeseries to model and plot how something like the relationship between the 1972 Unrest Scale relates to feelings toward the Republican presidential candidate.
# tidyverse packages
library(tidyverse)
library(janitor)
# file path management (requires use of R Projects)
library(here)
# data translation
library(foreign)
library(haven)
# plotting
library(ggridges)
library(sjPlot)
# with sjPlot, for changing labels
library(sjlabelled)
# tables
library(stargazer)
NOTE: The haven
package has a useful feature when importing the formats .dta
or .sav
which is that it keeps each columns label information that is stored with those data. For example we might look at the variable V201600
and we can see the @ label that tells us the question description. This is called metadata (data about our data). We can also see that the type of data is dbl+lbl.
This is short for double — typical of numeric data that’s not integers — and label.
sjlabelled::get_label(anes[37]) # sex
VCF0104
"Respondent - Gender"
sjlabelled::get_label(anes[313]) # thermometer GOP candidate
VCF0426
"Thermometer - Republican Presidential Candidate"
str(anes$VCF0104)
dbl+lbl [1:59944] 1, 2, 2, 2, 1, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 1...
@ label : chr "Respondent - Gender"
@ format.stata: chr "%16.0g"
@ labels : Named num [1:4] 0 1 2 3
..- attr(*, "names")= chr [1:4] "0. NA; no Pre IW" "1. Male" "2. Female" "3. Other (2016)"
str(anes$VCF0426)
dbl+lbl [1:59944] NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
@ label : chr "Thermometer - Republican Presidential Candidate"
@ format.stata: chr "%51.0g"
@ labels : Named num [1:3] 97 98 99
..- attr(*, "names")= chr [1:3] "97. 97-100 Degrees" "98. DK (exc.1968); don't recognize (1980 and later)" "99. NA; no Post IW"
# also get_labels WITH S useful for seeing responses
sjlabelled::get_labels(anes$VCF0105a)
[1] "1. White non-Hispanic (1948-2012)"
[2] "2. Black non-Hispanic (1948-2012)"
[3] "3. Asian or Pacific Islander, non-Hispanic (1966-2012)"
[4] "4. American Indian or Alaska Native non-Hispanic (1966-2012)"
[5] "5. Hispanic (1966-2012)"
[6] "6. Other or multiple races, non-Hispanic (1968-2012)"
[7] "7. Non-white and non-black (1948-1964)"
[8] "9. Missing"
Many functions like lm()
work just fine with labeled data. sjPlot::plot_model()
however will break on labeled data.
# in general DO NOT use raw variables you haven't recoded as they'll include values like 998 for missing data
lm1 <- lm(VCF0426 ~ VCF0104, data = anes)
# this generates an error due to labeled data
sjPlot::plot_model(lm1, type = "pred", terms = "VCF0104")
# Error: Can't combine `..1` <character> and `..2` <double>.
# Run `rlang::last_error()` to see where the error occurred.
Consequently, it’s important to strip the labels before those variables get passed to lm()
. You can do that easily with commands like as.numeric()
or as.character()
or as.factor()
. See sample code below:
anes <- anes %>%
mutate(
feel_trump = VCF9057 %>% as.numeric(),
male_fct = case_when(
VCF0104 == 1 ~ "Male",
VCF0104 == 2 ~ "Female",
) %>% as.factor()
)
# Now, sjPlot::plot_model() works
lm1 <- lm(feel_trump ~ male_fct, data = anes)
sjPlot::plot_model(lm1, type = "pred", terms = "male_fct")
anes <- anes %>%
mutate(
# this is a time series version of ANES with multiple years
year = VCF0004 %>% as.factor(),
# create column for age
age = VCF0102 %>% as.numeric(),
# convert sex to binary
male_bin = case_when(
VCF0104 == 1 ~ 1,
TRUE ~ 0) %>% as.numeric(),
# convert sex to factor
male_fct = case_when(
VCF0104 == 1 ~ "Male",
VCF0104 == 2 ~ "Female"
) %>% as.factor(),
# create race numeric
race_num = VCF0105a %>% as.numeric(),
# create race factor
race_fct = case_when(
VCF0105a == 1 ~ "White",
VCF0105a == 2 ~ "Black",
VCF0105a == 3 ~ "AAPI",
VCF0105a == 4 ~ "Native American",
VCF0105a == 5 ~ "Hispanic",
VCF0105a == 6 ~ "Other",
VCF0105a == 7 ~ "Other", # combine two categories
VCF0105a == 9 ~ NA_character_
) %>% as.factor(),
# create smaller race factor
race5_fct = case_when(
VCF0105a == 1 ~ "White",
VCF0105a == 2 ~ "Black",
VCF0105a == 3 ~ "AAPI",
VCF0105a == 5 ~ "Hispanic",
TRUE ~ "Other"
) %>% as.factor(),
# example: variable with just Whites and Blacks
race_wb_fct = case_when(
race_num == 1 ~ "White",
race_num == 2 ~ "Black",
TRUE ~ NA_character_
) %>% as.factor(),
# example: create race dummy (0/1) variables
race_white_bin = (race_num == 1) %>% as.numeric(),
race_black_bin = (race_num == 2) %>% as.numeric(),
race_aapi_bin = (race_num == 3) %>% as.numeric(),
race_native_bin = (race_num == 4) %>% as.numeric(),
race_hisp_bin = (race_num == 5) %>% as.numeric(),
race_other_bin = (race_num == 6 & race_num == 7) %>% as.numeric(),
# main explanatory and outcome vars
unrest_scale = VCF0528 %>% as.numeric(),
therm_rep_pres = VCF0426 %>% as.numeric()
)
# subset this version of ANES for the year 1972
a72 <- anes %>% filter(year == 1972)
Typically we might control for something like 5-7 variables. In this example, we just use a handful to illustrate.
# Thermometers Republican Candidate vs Unrest Scale
reduced_model <- lm(formula = therm_rep_pres ~ unrest_scale,
data = a72)
summary(reduced_model)
Call:
lm(formula = therm_rep_pres ~ unrest_scale, data = a72)
Residuals:
Min 1Q Median 3Q Max
-68.20 -14.75 5.25 20.25 34.56
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 61.29 2.28 26.93 <2e-16 ***
unrest_scale 1.15 0.67 1.72 0.086 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 26.4 on 732 degrees of freedom
(1971 observations deleted due to missingness)
Multiple R-squared: 0.00402, Adjusted R-squared: 0.00266
F-statistic: 2.95 on 1 and 732 DF, p-value: 0.086
full_model <- lm(formula = therm_rep_pres ~ unrest_scale + male_fct + race_wb_fct + age,
data = a72)
summary(full_model)
Call:
lm(formula = therm_rep_pres ~ unrest_scale + male_fct + race_wb_fct +
age, data = a72)
Residuals:
Min 1Q Median 3Q Max
-73.06 -13.98 4.39 18.92 50.25
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 45.008 4.045 11.13 < 2e-16 ***
unrest_scale 1.079 0.662 1.63 0.10
male_fctMale -0.899 1.944 -0.46 0.64
race_wb_fctWhite 16.092 3.313 4.86 0.0000015 ***
age 0.784 0.586 1.34 0.18
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 25.9 on 714 degrees of freedom
(1986 observations deleted due to missingness)
Multiple R-squared: 0.0405, Adjusted R-squared: 0.0352
F-statistic: 7.54 on 4 and 714 DF, p-value: 0.00000596
stargazer
# stargazer can take multiple models
stargazer(reduced_model, full_model, header = FALSE, type = "html")
Dependent variable: | ||
therm_rep_pres | ||
(1) | (2) | |
unrest_scale | 1.150^{*} | 1.080 |
(0.670) | (0.662) | |
male_fctMale | -0.899 | |
(1.940) | ||
race_wb_fctWhite | 16.100^{***} | |
(3.310) | ||
age | 0.784 | |
(0.586) | ||
Constant | 61.300^{***} | 45.000^{***} |
(2.280) | (4.040) | |
Observations | 734 | 719 |
R^{2} | 0.004 | 0.041 |
Adjusted R^{2} | 0.003 | 0.035 |
Residual Std. Error | 26.400 (df = 732) | 25.900 (df = 714) |
F Statistic | 2.960^{*} (df = 1; 732) | 7.540^{***} (df = 4; 714) |
Note: | ^{}p<0.1; ^{}p<0.05; ^{}p<0.01 |
plot_model
# plot_model from sjPlot package
plot_model(
model = full_model,
type = "pred", # plot predicted value
terms = c("unrest_scale") # plot unrest_scale as predictor (x-axis)
) +
theme_minimal() +
ggtitle("Thermometer GOP Candidate vs Unrest Scale") +
labs(y = "Predicted Therm-GOP Pres Cand")
We can also look at how race moderates the main relationship. The slope of the relationship between the GOP Candidate Feeling Thermometer and the Urban Unrest Scale is the same for both groups (white and black) but whites, on average, feel more warmly towards the GOP candidate and blacks, on average, feel more cooly toward towards the GOP candidate (an intercept shift).
plot_model(
model = full_model,
type = "pred", # plot predicted value
terms = c("unrest_scale", "race_wb_fct") # plot unrest_scale as predictor
# plot race_wb_fct as moderator
) +
theme_minimal() +
ggtitle("Thermometer GOP Candidate vs Unrest Scale") +
labs(y = "Predicted Therm-GOP Pres Cand",
color = 'Race') # change legend title
This supplement was put together by Omar Wasow.