Solutions to handout.
Governments across the developed and developing world often implement public policies seeking to improve the welfare of those citizens most in need. The WIC (Women, Infants and Children) program was designed to assist “low-income pregnant, breastfeeding, and non-breastfeeding postpartum women, and to infants and children up to age five who are found to be at nutritional risk.”1
How effective are these policies at improving social welfare? This is difficult to answer: first we must decide what are the relevant social indicators that we will measure. Second, there is the problem of confounding. Mothers and children who receive WIC will systematically differ from those who don’t. In particular, they will be on average poorer. So, for example, there may be dietary deficiencies that will affect the outcome of interest. How can we address that?
One approach would be to run randomized controlled trials (RCTs). This raises ethical questions, however, about whether governments should deliberately withhold benefits from citizens in the control group. An alternative, given the absence of randomization, is matching. Through matching, we try to achieve with observational data what randomization aims to achieve by design: treatment and control groups that are identical to one another except for the treatment assignment itself (on average). To do so, we will use two packages:
Here is a short working example with the car database
mtcars. Suppose we are interested in the relationship between a “treatment,”
vs, denoting the type of engine (V-shaped or straight) and our outcome of interest,
mpg, miles-per-gallon. We are concerned, though, that there are systematic differences between the two
vs types. For example, if we compare the distribution of horsepower in the two types, we see substantial areas of non-overlap:
mtcars %>% mutate(vs = factor(vs)) %>% ggplot() + aes(x = hp, color = vs) + geom_density()
To compare “apples to apples,” we want to find observations that are similar on observed covariates. To do this, we match on other covariates that we know will affect
wt. We specify the treatment on the left hand side (LHS), and the covariates we want to balance on the right hand side (RHS):
vs ~ cyl + hp + wt in the
Note that we specify
method = "nearest" and
caliper = 0.1. These indicate that the algorithm will find the nearest neighbors to our treatment observation, within a caliper (tolerance parameter) of 0.1. Do not worry too much about it for now.2
mtcars_m_out <- matchit( formula = vs ~ cyl + hp + wt, data = mtcars, method = "nearest", caliper = 0.1 )
We then evaluate the matching process with commands like
summary(matchit_output). Looking at the output below, what do you notice about the mean values for “treated” and “control” units in all the data versus the matched data?
# overview of matching mtcars_m_out
Call: matchit(formula = vs ~ cyl + hp + wt, data = mtcars, method = "nearest", caliper = 0.1) Sample sizes: Control Treated All 18 14 Matched 2 2 Unmatched 16 12 Discarded 0 0
# more detailed summary summary(mtcars_m_out)
Call: matchit(formula = vs ~ cyl + hp + wt, data = mtcars, method = "nearest", caliper = 0.1) Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eQQ Med distance 0.8666 0.1037 0.2387 0.7629 0.8762 cyl 4.5714 7.4444 1.1490 -2.8730 2.0000 hp 91.3571 189.7222 60.2815 -98.3651 83.5000 wt 2.6113 3.6886 0.9040 -1.0773 0.9375 eQQ Mean eQQ Max distance 0.7666 0.9625 cyl 2.7143 4.0000 hp 94.7857 212.0000 wt 0.9936 1.9640 Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eQQ Med distance 0.7243 0.6889 0.3639 0.0354 0.0354 cyl 4.0000 5.0000 1.4142 -1.0000 1.0000 hp 89.5000 100.5000 13.4350 -11.0000 14.0000 wt 1.8565 2.5075 0.5197 -0.6510 0.6510 eQQ Mean eQQ Max distance 0.0354 0.0412 cyl 1.0000 2.0000 hp 14.0000 25.0000 wt 0.6510 0.6750 Percent Balance Improvement: Mean Diff. eQQ Med eQQ Mean eQQ Max distance 95.3624 95.9624 95.3847 95.7156 cyl 65.1934 50.0000 63.1579 50.0000 hp 88.8172 83.2335 85.2298 88.2075 wt 39.5695 30.5600 34.4835 65.6314 Sample sizes: Control Treated All 18 14 Matched 2 2 Unmatched 16 12 Discarded 0 0
To convert the
matchit object to a
data.frame, we use:
mtcars_match <- match.data(mtcars_m_out)
To check for covariate balance:
# abs = TRUE for absolute difference cobalt::love.plot(mtcars_m_out, abs = TRUE)
In this exercise, we will analyze a subset of white families from the ECLS (Early Childhood Longitudinal Study). We will focus on a set of family characteristics and the treatment, whether or not the child’s mother was a beneficiary of the WIC program. The outcome of interest is the student’s performance in reading and math tests at kindergarten.
wic: dummy variable, indicating whether or not child’s mother received WIC.
age.k.fall: age of child fall of kindergarten year
books: number of books in home
books.2: number of books squared
birth.weight.lbs: child birth weight lbs
birth.weight.oz: child birth weight oz
birth.weight: total birth weight oz
female: child is female
math.irt.k.spring: student’s math grade in kindergarten.
mother.age: mother age
mother.at.least.30: binary variable for mother \(>=\) 30
read.irt.k.spring: student’s reading grade in kindergarten.
ses.comp: socioeconomic status composite
teenage.mother: binary variable for mother age \(<\) 20
# IMPORTANT: change chunk header above to eval = TRUE for knitting library(dplyr) library(ggplot2) library(MatchIt) library(cobalt) library(stargazer) library(kableExtra) library(janitor) # load data ecls <- read.csv(url("http://pol346.com/data/ecls_white_small.csv")) %>% clean_names() # treatment variable: convert to factor ecls <- ecls %>% mutate( wic = as.factor(wic) ) # change the style of ggplot output theme_set(theme_minimal())
ecls %>% count(wic) %>% kable( caption = "Breakdown of beneficiaries of WIC", align = "c" ) %>% kable_styling(position = "center")
Out of 7642 students, approximately two thousand have received the WIC.
# socioeconomic status ecls %>% ggplot() + geom_density( aes( x = ses_comp, group = wic, col = wic ) )
# child weight ecls %>% ggplot() + geom_density( aes( x = birth_weight, group = wic, col = wic ) )
mother_age. Display these models with
stargazer, comparing the simple with the full model.
# baseline models fit_read_base <- lm(read_irt_k_fall ~ wic, data = ecls) # with controls fit_read_control <- lm( read_irt_k_fall ~ wic + female + age_k_fall + ses_comp + books + books_2 + birth_weight + mother_age, data = ecls ) stargazer( fit_read_base, fit_read_control, header = F, type = 'html' )
|Residual Std. Error||8.584 (df = 7640)||7.974 (df = 7633)|
|F Statistic||482.177*** (df = 1; 7640)||222.460*** (df = 8; 7633)|
|Note:||p<0.1; p<0.05; p<0.01|
Interpret the results above. What is the relationship between student test scores and WIC? Can we make a causal claim? Why or why not?
To address concerns with confounding, let’s perform matching across treatment and control groups (WIC and non-WIC). Include all predictors of the model with controls. Make sure that the
caliper parameter is adjusted to
ecls_m_out <- matchit( formula = wic ~ female + age_k_fall + ses_comp + books + books_2 + birth_weight + mother_age, data = ecls, method = "nearest", caliper = 0.1 ) ecls_match <- match.data(ecls_m_out)
matchitoutput. Look at the table under “Summary of balance for all data” and glance at the Means Treated and Means Control columns. Which covariates exhibit more balance (that is are similar in both treated and control conditions)? Are there covariates that seem far apart? Are some covariates already quite similar? Now look at the “Summary of balance for matched data.” Has there been improvement in balance?
Call: matchit(formula = wic ~ female + age_k_fall + ses_comp + books + books_2 + birth_weight + mother_age, data = ecls, method = "nearest", caliper = 0.1) Summary of balance for all data: Means Treated Means Control SD Control Mean Diff distance 0.5228 0.1843 0.1974 0.3385 female 0.4880 0.4914 0.5000 -0.0034 age_k_fall 68.8792 68.8343 4.2887 0.0449 ses_comp -0.2793 0.4748 0.6824 -0.7541 books 77.6289 104.2006 58.7065 -26.5717 books_2 9099.5510 14303.5959 14391.5616 -5204.0449 birth_weight 118.0521 121.9615 19.9567 -3.9094 mother_age 21.4998 26.6762 4.8849 -5.1765 eQQ Med eQQ Mean eQQ Max distance 0.3811 0.3385 4.658e-01 female 0.0000 0.0033 1.000e+00 age_k_fall 0.0700 0.0802 4.500e+00 ses_comp 0.7500 0.7531 2.590e+00 books 20.0000 26.5552 1.000e+02 books_2 1875.0000 5202.6914 3.000e+04 birth_weight 3.0000 3.8920 2.100e+01 mother_age 6.0000 5.1738 7.000e+00 Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eQQ Med distance 0.4312 0.4161 0.2044 0.0150 0.0187 female 0.4875 0.4875 0.5000 0.0000 0.0000 age_k_fall 68.8645 68.8392 4.2520 0.0253 0.0700 ses_comp -0.1478 -0.1083 0.4617 -0.0395 0.0300 books 86.0288 86.1727 56.5233 -0.1440 0.0000 books_2 10636.4702 10618.5835 13037.2908 17.8868 0.0000 birth_weight 120.1004 120.2943 20.4757 -0.1939 0.0000 mother_age 22.5905 22.6660 3.9743 -0.0755 0.0000 eQQ Mean eQQ Max distance 0.0151 2.60e-02 female 0.0000 0.00e+00 age_k_fall 0.0975 6.30e-01 ses_comp 0.0449 1.61e+00 books 1.7844 5.00e+01 books_2 289.7319 1.75e+04 birth_weight 0.6673 1.80e+01 mother_age 0.3788 5.00e+00 Percent Balance Improvement: Mean Diff. eQQ Med eQQ Mean eQQ Max distance 95.5597 95.1002 95.5437 94.4252 female 100.0000 0.0000 100.0000 100.0000 age_k_fall 43.7532 0.0000 -21.5376 86.0000 ses_comp 94.7635 96.0000 94.0362 37.8378 books 99.4582 100.0000 93.2805 50.0000 books_2 99.6563 100.0000 94.4311 41.6667 birth_weight 95.0412 100.0000 82.8543 14.2857 mother_age 98.5416 100.0000 92.6793 28.5714 Sample sizes: Control Treated All 5513 2129 Matched 1563 1563 Unmatched 3950 566 Discarded 0 0
cobalt::love.plot()(see sample code above). Has the balance improved as a result of matching?
Call matchit(formula = wic ~ female + age_k_fall + ses_comp + books + books_2 + birth_weight + mother_age, data = ecls, method = "nearest", caliper = 0.1) Balance Measures Type Diff.Adj distance Distance 0.0681 female Binary 0.0000 age_k_fall Contin. 0.0059 ses_comp Contin. -0.0655 books Contin. -0.0025 books_2 Contin. 0.0013 birth_weight Contin. -0.0094 mother_age Contin. -0.0164 Sample sizes Control Treated All 5513 2129 Matched 1563 1563 Unmatched 3950 566
In unmatched data, the magnitude of our coefficient in the naive model is substantially larger than the coefficient in the model with controls suggesting that our estimate is driven non-trivially by the model we pick.
Notice, in the matched data, that moving from the naive model to the full model does not result in substantial changes in the magnitude of the estimate suggesting that our estimate is not model dependent.
# baseline models fit_read_base_m <- lm(read_irt_k_fall ~ wic, data = ecls_match) # with controls fit_read_control_m <- lm( read_irt_k_fall ~ wic + female + age_k_fall + ses_comp + books + books_2 + birth_weight + mother_age, data = ecls_match ) stargazer( fit_read_base_m, fit_read_control_m, type = 'html', header = F )
|Residual Std. Error||7.510 (df = 3124)||7.106 (df = 3117)|
|F Statistic||23.394*** (df = 1; 3124)||49.757*** (df = 8; 3117)|
|Note:||p<0.1; p<0.05; p<0.01|
As further evidence that our estimate with the matched data is less model dependent (e.g., results determined by how we specify the model with controls, interactions, quadratics, etc.), here is a contrast of
t.test() results with the unmatched and then the matched data. The
t.test with matched data produces similar estimates to both the naive and the full regression models with matched data.
t.test(read_irt_k_fall ~ wic, data = ecls) %>% broom::tidy() %>% select(contains("estimate"),statistic, p.value, contains("conf")) %>% kable(format = "html", digits = 2) %>% kable_styling(bootstrap_options = "striped", full_width = FALSE)
t.test(read_irt_k_fall ~ wic, data = ecls_match) %>% broom::tidy() %>% select(contains("estimate"),statistic, p.value, contains("conf")) %>% kable(format = "html", digits = 2) %>% kable_styling(bootstrap_options = "striped", full_width = FALSE)
Given this analysis, what would you recommend to a policy-maker? Should we continue the WIC program or should we think about redesigning it? There is no “right” answer, use your critical thinking and use the statistical evidence to inform your opinion.
What are some limitations of matching we should keep in mind?
t.test()gives us a similar estimate to the full model, suggesting matching has reduced model dependence.
For more information, see https://www.fns.usda.gov/wic/women-infants-and-children-wic ↩︎
But if you really do, you can read this excellent Wikipedia article on matching through propensity score (here)[https://en.wikipedia.org/wiki/Propensity_score_matching].↩︎