Solutions to handout.
Governments across the developed and developing world often implement public policies seeking to improve the welfare of those citizens most in need. The WIC (Women, Infants and Children) program was designed to assist “low-income pregnant, breastfeeding, and non-breastfeeding postpartum women, and to infants and children up to age five who are found to be at nutritional risk.”^{1}
How effective are these policies at improving social welfare? This is difficult to answer: first we must decide what are the relevant social indicators that we will measure. Second, there is the problem of confounding. Mothers and children who receive WIC will systematically differ from those who don’t. In particular, they will be on average poorer. So, for example, there may be dietary deficiencies that will affect the outcome of interest. How can we address that?
One approach would be to run randomized controlled trials (RCTs). This raises ethical questions, however, about whether governments should deliberately withhold benefits from citizens in the control group. An alternative, given the absence of randomization, is matching. Through matching, we try to achieve with observational data what randomization aims to achieve by design: treatment and control groups that are identical to one another except for the treatment assignment itself (on average). To do so, we will use two packages: MatchIt
and cobalt
.
install.packages("MatchIt")
install.packages("cobalt")
Here is a short working example with the car database mtcars
. Suppose we are interested in the relationship between a “treatment,” vs
, denoting the type of engine (V-shaped or straight) and our outcome of interest, mpg
, miles-per-gallon. We are concerned, though, that there are systematic differences between the two vs
types. For example, if we compare the distribution of horsepower in the two types, we see substantial areas of non-overlap:
mtcars %>%
mutate(vs = factor(vs)) %>%
ggplot() +
aes(x = hp, color = vs) +
geom_density()
To compare “apples to apples,” we want to find observations that are similar on observed covariates. To do this, we match on other covariates that we know will affect mpg
: cyl
, hp
and wt
. We specify the treatment on the left hand side (LHS), and the covariates we want to balance on the right hand side (RHS): vs ~ cyl + hp + wt
in the formula
argument.
Note that we specify method = "nearest"
and caliper = 0.1
. These indicate that the algorithm will find the nearest neighbors to our treatment observation, within a caliper (tolerance parameter) of 0.1. Do not worry too much about it for now.^{2}
mtcars_m_out <- matchit(
formula = vs ~ cyl + hp + wt,
data = mtcars,
method = "nearest",
caliper = 0.1
)
We then evaluate the matching process with commands like summary(matchit_output)
. Looking at the output below, what do you notice about the mean values for “treated” and “control” units in all the data versus the matched data?
# overview of matching
mtcars_m_out
Call:
matchit(formula = vs ~ cyl + hp + wt, data = mtcars, method = "nearest",
caliper = 0.1)
Sample sizes:
Control Treated
All 18 14
Matched 2 2
Unmatched 16 12
Discarded 0 0
# more detailed summary
summary(mtcars_m_out)
Call:
matchit(formula = vs ~ cyl + hp + wt, data = mtcars, method = "nearest",
caliper = 0.1)
Summary of balance for all data:
Means Treated Means Control SD Control Mean Diff eQQ Med
distance 0.8666 0.1037 0.2387 0.7629 0.8762
cyl 4.5714 7.4444 1.1490 -2.8730 2.0000
hp 91.3571 189.7222 60.2815 -98.3651 83.5000
wt 2.6113 3.6886 0.9040 -1.0773 0.9375
eQQ Mean eQQ Max
distance 0.7666 0.9625
cyl 2.7143 4.0000
hp 94.7857 212.0000
wt 0.9936 1.9640
Summary of balance for matched data:
Means Treated Means Control SD Control Mean Diff eQQ Med
distance 0.7243 0.6889 0.3639 0.0354 0.0354
cyl 4.0000 5.0000 1.4142 -1.0000 1.0000
hp 89.5000 100.5000 13.4350 -11.0000 14.0000
wt 1.8565 2.5075 0.5197 -0.6510 0.6510
eQQ Mean eQQ Max
distance 0.0354 0.0412
cyl 1.0000 2.0000
hp 14.0000 25.0000
wt 0.6510 0.6750
Percent Balance Improvement:
Mean Diff. eQQ Med eQQ Mean eQQ Max
distance 95.3624 95.9624 95.3847 95.7156
cyl 65.1934 50.0000 63.1579 50.0000
hp 88.8172 83.2335 85.2298 88.2075
wt 39.5695 30.5600 34.4835 65.6314
Sample sizes:
Control Treated
All 18 14
Matched 2 2
Unmatched 16 12
Discarded 0 0
To convert the matchit
object to a data.frame
, we use:
mtcars_match <- match.data(mtcars_m_out)
To check for covariate balance:
# abs = TRUE for absolute difference
cobalt::love.plot(mtcars_m_out, abs = TRUE)
In this exercise, we will analyze a subset of white families from the ECLS (Early Childhood Longitudinal Study). We will focus on a set of family characteristics and the treatment, whether or not the child’s mother was a beneficiary of the WIC program. The outcome of interest is the student’s performance in reading and math tests at kindergarten.
wic
: dummy variable, indicating whether or not child’s mother received WIC.age.k.fall
: age of child fall of kindergarten yearbooks
: number of books in homebooks.2
: number of books squaredbirth.weight.lbs
: child birth weight lbsbirth.weight.oz
: child birth weight ozbirth.weight
: total birth weight ozfemale
: child is femalemath.irt.k.spring
: student’s math grade in kindergarten.mother.age
: mother agemother.at.least.30
: binary variable for mother \(>=\) 30read.irt.k.spring
: student’s reading grade in kindergarten.ses.comp
: socioeconomic status compositeteenage.mother
: binary variable for mother age \(<\) 20
# IMPORTANT: change chunk header above to eval = TRUE for knitting
library(dplyr)
library(ggplot2)
library(MatchIt)
library(cobalt)
library(stargazer)
library(kableExtra)
library(janitor)
# load data
ecls <- read.csv(url("http://pol346.com/data/ecls_white_small.csv")) %>%
clean_names()
# treatment variable: convert to factor
ecls <- ecls %>%
mutate(
wic = as.factor(wic)
)
# change the style of ggplot output
theme_set(theme_minimal())
ecls %>%
count(wic) %>%
kable(
caption = "Breakdown of beneficiaries of WIC",
align = "c"
) %>%
kable_styling(position = "center")
wic | n |
---|---|
0 | 5513 |
1 | 2129 |
Out of 7642 students, approximately two thousand have received the WIC.
# socioeconomic status
ecls %>%
ggplot() +
geom_density(
aes(
x = ses_comp,
group = wic,
col = wic
)
)
# child weight
ecls %>%
ggplot() +
geom_density(
aes(
x = birth_weight,
group = wic,
col = wic
)
)
female
, age_k_fall
, ses_comp
, books
, books_2
(books squared), birth_weight
, mother_age
. Display these models with stargazer
, comparing the simple with the full model.
# baseline models
fit_read_base <- lm(read_irt_k_fall ~ wic, data = ecls)
# with controls
fit_read_control <- lm(
read_irt_k_fall ~ wic + female + age_k_fall + ses_comp +
books + books_2 + birth_weight + mother_age,
data = ecls
)
stargazer(
fit_read_base, fit_read_control,
header = F,
type = 'html'
)
Dependent variable: | ||
read_irt_k_fall | ||
(1) | (2) | |
wic1 | -4.809^{***} | -1.110^{***} |
(0.219) | (0.241) | |
female | 1.531^{***} | |
(0.185) | ||
age_k_fall | 0.427^{***} | |
(0.021) | ||
ses_comp | 2.809^{***} | |
(0.156) | ||
books | 0.039^{***} | |
(0.007) | ||
books_2 | -0.0001^{***} | |
(0.00003) | ||
birth_weight | 0.018^{***} | |
(0.005) | ||
mother_age | 0.223^{***} | |
(0.021) | ||
Constant | 25.908^{***} | -16.006^{***} |
(0.116) | (1.748) | |
Observations | 7,642 | 7,642 |
R^{2} | 0.059 | 0.189 |
Adjusted R^{2} | 0.059 | 0.188 |
Residual Std. Error | 8.584 (df = 7640) | 7.974 (df = 7633) |
F Statistic | 482.177^{***} (df = 1; 7640) | 222.460^{***} (df = 8; 7633) |
Note: | ^{}p<0.1; ^{}p<0.05; ^{}p<0.01 |
Interpret the results above. What is the relationship between student test scores and WIC? Can we make a causal claim? Why or why not?
To address concerns with confounding, let’s perform matching across treatment and control groups (WIC and non-WIC). Include all predictors of the model with controls. Make sure that the caliper
parameter is adjusted to 0.1
.
ecls_m_out <- matchit(
formula = wic ~ female + age_k_fall + ses_comp +
books + books_2 + birth_weight + mother_age,
data = ecls,
method = "nearest",
caliper = 0.1
)
ecls_match <- match.data(ecls_m_out)
summary()
on your matchit
output. Look at the table under “Summary of balance for all data” and glance at the Means Treated and Means Control columns. Which covariates exhibit more balance (that is are similar in both treated and control conditions)? Are there covariates that seem far apart? Are some covariates already quite similar? Now look at the “Summary of balance for matched data.” Has there been improvement in balance?
summary(ecls_m_out)
Call:
matchit(formula = wic ~ female + age_k_fall + ses_comp + books +
books_2 + birth_weight + mother_age, data = ecls, method = "nearest",
caliper = 0.1)
Summary of balance for all data:
Means Treated Means Control SD Control Mean Diff
distance 0.5228 0.1843 0.1974 0.3385
female 0.4880 0.4914 0.5000 -0.0034
age_k_fall 68.8792 68.8343 4.2887 0.0449
ses_comp -0.2793 0.4748 0.6824 -0.7541
books 77.6289 104.2006 58.7065 -26.5717
books_2 9099.5510 14303.5959 14391.5616 -5204.0449
birth_weight 118.0521 121.9615 19.9567 -3.9094
mother_age 21.4998 26.6762 4.8849 -5.1765
eQQ Med eQQ Mean eQQ Max
distance 0.3811 0.3385 4.658e-01
female 0.0000 0.0033 1.000e+00
age_k_fall 0.0700 0.0802 4.500e+00
ses_comp 0.7500 0.7531 2.590e+00
books 20.0000 26.5552 1.000e+02
books_2 1875.0000 5202.6914 3.000e+04
birth_weight 3.0000 3.8920 2.100e+01
mother_age 6.0000 5.1738 7.000e+00
Summary of balance for matched data:
Means Treated Means Control SD Control Mean Diff eQQ Med
distance 0.4312 0.4161 0.2044 0.0150 0.0187
female 0.4875 0.4875 0.5000 0.0000 0.0000
age_k_fall 68.8645 68.8392 4.2520 0.0253 0.0700
ses_comp -0.1478 -0.1083 0.4617 -0.0395 0.0300
books 86.0288 86.1727 56.5233 -0.1440 0.0000
books_2 10636.4702 10618.5835 13037.2908 17.8868 0.0000
birth_weight 120.1004 120.2943 20.4757 -0.1939 0.0000
mother_age 22.5905 22.6660 3.9743 -0.0755 0.0000
eQQ Mean eQQ Max
distance 0.0151 2.60e-02
female 0.0000 0.00e+00
age_k_fall 0.0975 6.30e-01
ses_comp 0.0449 1.61e+00
books 1.7844 5.00e+01
books_2 289.7319 1.75e+04
birth_weight 0.6673 1.80e+01
mother_age 0.3788 5.00e+00
Percent Balance Improvement:
Mean Diff. eQQ Med eQQ Mean eQQ Max
distance 95.5597 95.1002 95.5437 94.4252
female 100.0000 0.0000 100.0000 100.0000
age_k_fall 43.7532 0.0000 -21.5376 86.0000
ses_comp 94.7635 96.0000 94.0362 37.8378
books 99.4582 100.0000 93.2805 50.0000
books_2 99.6563 100.0000 94.4311 41.6667
birth_weight 95.0412 100.0000 82.8543 14.2857
mother_age 98.5416 100.0000 92.6793 28.5714
Sample sizes:
Control Treated
All 5513 2129
Matched 1563 1563
Unmatched 3950 566
Discarded 0 0
cobalt::love.plot()
(see sample code above). Has the balance improved as a result of matching?
bal.tab(ecls_m_out)
Call
matchit(formula = wic ~ female + age_k_fall + ses_comp + books +
books_2 + birth_weight + mother_age, data = ecls, method = "nearest",
caliper = 0.1)
Balance Measures
Type Diff.Adj
distance Distance 0.0681
female Binary 0.0000
age_k_fall Contin. 0.0059
ses_comp Contin. -0.0655
books Contin. -0.0025
books_2 Contin. 0.0013
birth_weight Contin. -0.0094
mother_age Contin. -0.0164
Sample sizes
Control Treated
All 5513 2129
Matched 1563 1563
Unmatched 3950 566
love.plot(ecls_m_out)
In unmatched data, the magnitude of our coefficient in the naive model is substantially larger than the coefficient in the model with controls suggesting that our estimate is driven non-trivially by the model we pick.
Notice, in the matched data, that moving from the naive model to the full model does not result in substantial changes in the magnitude of the estimate suggesting that our estimate is not model dependent.
# baseline models
fit_read_base_m <- lm(read_irt_k_fall ~ wic, data = ecls_match)
# with controls
fit_read_control_m <- lm(
read_irt_k_fall ~ wic + female + age_k_fall + ses_comp +
books + books_2 + birth_weight + mother_age,
data = ecls_match
)
stargazer(
fit_read_base_m, fit_read_control_m,
type = 'html',
header = F
)
Dependent variable: | ||
read_irt_k_fall | ||
(1) | (2) | |
wic1 | -1.299^{***} | -1.185^{***} |
(0.269) | (0.254) | |
female | 1.579^{***} | |
(0.257) | ||
age_k_fall | 0.383^{***} | |
(0.030) | ||
ses_comp | 2.385^{***} | |
(0.268) | ||
books | 0.059^{***} | |
(0.009) | ||
books_2 | -0.0002^{***} | |
(0.00004) | ||
birth_weight | 0.009 | |
(0.006) | ||
mother_age | 0.201^{***} | |
(0.031) | ||
Constant | 22.996^{***} | -12.078^{***} |
(0.190) | (2.425) | |
Observations | 3,126 | 3,126 |
R^{2} | 0.007 | 0.113 |
Adjusted R^{2} | 0.007 | 0.111 |
Residual Std. Error | 7.510 (df = 3124) | 7.106 (df = 3117) |
F Statistic | 23.394^{***} (df = 1; 3124) | 49.757^{***} (df = 8; 3117) |
Note: | ^{}p<0.1; ^{}p<0.05; ^{}p<0.01 |
As further evidence that our estimate with the matched data is less model dependent (e.g., results determined by how we specify the model with controls, interactions, quadratics, etc.), here is a contrast of t.test()
results with the unmatched and then the matched data. The t.test
with matched data produces similar estimates to both the naive and the full regression models with matched data.
t.test(read_irt_k_fall ~ wic, data = ecls) %>%
broom::tidy() %>%
select(contains("estimate"),statistic, p.value, contains("conf")) %>%
kable(format = "html", digits = 2) %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE)
estimate | estimate1 | estimate2 | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
4.81 | 25.91 | 21.1 | 24.73 | 0 | 4.43 | 5.19 |
t.test(read_irt_k_fall ~ wic, data = ecls_match) %>%
broom::tidy() %>%
select(contains("estimate"),statistic, p.value, contains("conf")) %>%
kable(format = "html", digits = 2) %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE)
estimate | estimate1 | estimate2 | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
1.3 | 23 | 21.7 | 4.84 | 0 | 0.77 | 1.83 |
Given this analysis, what would you recommend to a policy-maker? Should we continue the WIC program or should we think about redesigning it? There is no “right” answer, use your critical thinking and use the statistical evidence to inform your opinion.
What are some limitations of matching we should keep in mind?
t.test()
gives us a similar estimate to the full model, suggesting matching has reduced model dependence.For more information, see https://www.fns.usda.gov/wic/women-infants-and-children-wic ↩︎
But if you really do, you can read this excellent Wikipedia article on matching through propensity score (here)[https://en.wikipedia.org/wiki/Propensity_score_matching].↩︎