# Week 10: Matching

Solutions to handout.

Team 346 pol346.com (Princeton Univeristy Department of Politics)princeton.edu/politics
2020-04-17

## Background:

Governments across the developed and developing world often implement public policies seeking to improve the welfare of those citizens most in need. The WIC (Women, Infants and Children) program was designed to assist “low-income pregnant, breastfeeding, and non-breastfeeding postpartum women, and to infants and children up to age five who are found to be at nutritional risk.”1

How effective are these policies at improving social welfare? This is difficult to answer: first we must decide what are the relevant social indicators that we will measure. Second, there is the problem of confounding. Mothers and children who receive WIC will systematically differ from those who don’t. In particular, they will be on average poorer. So, for example, there may be dietary deficiencies that will affect the outcome of interest. How can we address that?

One approach would be to run randomized controlled trials (RCTs). This raises ethical questions, however, about whether governments should deliberately withhold benefits from citizens in the control group. An alternative, given the absence of randomization, is matching. Through matching, we try to achieve with observational data what randomization aims to achieve by design: treatment and control groups that are identical to one another except for the treatment assignment itself (on average). To do so, we will use two packages: MatchIt and cobalt.


install.packages("MatchIt")
install.packages("cobalt")

Here is a short working example with the car database mtcars. Suppose we are interested in the relationship between a “treatment,” vs, denoting the type of engine (V-shaped or straight) and our outcome of interest, mpg, miles-per-gallon. We are concerned, though, that there are systematic differences between the two vs types. For example, if we compare the distribution of horsepower in the two types, we see substantial areas of non-overlap:


mtcars %>%
mutate(vs = factor(vs)) %>%
ggplot() +
aes(x = hp, color = vs) +
geom_density() 

To compare “apples to apples,” we want to find observations that are similar on observed covariates. To do this, we match on other covariates that we know will affect mpg: cyl, hp and wt. We specify the treatment on the left hand side (LHS), and the covariates we want to balance on the right hand side (RHS): vs ~ cyl + hp + wt in the formula argument.

Note that we specify method = "nearest" and caliper = 0.1. These indicate that the algorithm will find the nearest neighbors to our treatment observation, within a caliper (tolerance parameter) of 0.1. Do not worry too much about it for now.2


mtcars_m_out <- matchit(
formula = vs ~ cyl + hp + wt,
data    = mtcars,
method  = "nearest",
caliper = 0.1
)

We then evaluate the matching process with commands like summary(matchit_output). Looking at the output below, what do you notice about the mean values for “treated” and “control” units in all the data versus the matched data?


# overview of matching
mtcars_m_out

Call:
matchit(formula = vs ~ cyl + hp + wt, data = mtcars, method = "nearest",
caliper = 0.1)

Sample sizes:
Control Treated
All            18      14
Matched         2       2
Unmatched      16      12
Discarded       0       0

# more detailed summary
summary(mtcars_m_out)

Call:
matchit(formula = vs ~ cyl + hp + wt, data = mtcars, method = "nearest",
caliper = 0.1)

Summary of balance for all data:
Means Treated Means Control SD Control Mean Diff eQQ Med
distance        0.8666        0.1037     0.2387    0.7629  0.8762
cyl             4.5714        7.4444     1.1490   -2.8730  2.0000
hp             91.3571      189.7222    60.2815  -98.3651 83.5000
wt              2.6113        3.6886     0.9040   -1.0773  0.9375
eQQ Mean  eQQ Max
distance   0.7666   0.9625
cyl        2.7143   4.0000
hp        94.7857 212.0000
wt         0.9936   1.9640

Summary of balance for matched data:
Means Treated Means Control SD Control Mean Diff eQQ Med
distance        0.7243        0.6889     0.3639    0.0354  0.0354
cyl             4.0000        5.0000     1.4142   -1.0000  1.0000
hp             89.5000      100.5000    13.4350  -11.0000 14.0000
wt              1.8565        2.5075     0.5197   -0.6510  0.6510
eQQ Mean eQQ Max
distance   0.0354  0.0412
cyl        1.0000  2.0000
hp        14.0000 25.0000
wt         0.6510  0.6750

Percent Balance Improvement:
Mean Diff. eQQ Med eQQ Mean eQQ Max
distance    95.3624 95.9624  95.3847 95.7156
cyl         65.1934 50.0000  63.1579 50.0000
hp          88.8172 83.2335  85.2298 88.2075
wt          39.5695 30.5600  34.4835 65.6314

Sample sizes:
Control Treated
All            18      14
Matched         2       2
Unmatched      16      12
Discarded       0       0

To convert the matchit object to a data.frame, we use:


mtcars_match <- match.data(mtcars_m_out)

To check for covariate balance:


# abs = TRUE for absolute difference
cobalt::love.plot(mtcars_m_out, abs = TRUE)

## Exercise

In this exercise, we will analyze a subset of white families from the ECLS (Early Childhood Longitudinal Study). We will focus on a set of family characteristics and the treatment, whether or not the child’s mother was a beneficiary of the WIC program. The outcome of interest is the student’s performance in reading and math tests at kindergarten.

## Codebook:

• wic: dummy variable, indicating whether or not child’s mother received WIC.
• age.k.fall: age of child fall of kindergarten year
• books: number of books in home
• books.2 : number of books squared
• birth.weight.lbs: child birth weight lbs
• birth.weight.oz: child birth weight oz
• birth.weight: total birth weight oz
• female: child is female
• math.irt.k.spring: student’s math grade in kindergarten.
• mother.age: mother age
• mother.at.least.30: binary variable for mother $$>=$$ 30
• read.irt.k.spring: student’s reading grade in kindergarten.
• ses.comp: socioeconomic status composite
• teenage.mother: binary variable for mother age $$<$$ 20

# IMPORTANT: change chunk header above to eval = TRUE for knitting
library(dplyr)
library(ggplot2)
library(MatchIt)
library(cobalt)
library(stargazer)
library(kableExtra)
library(janitor)

clean_names()

# treatment variable: convert to factor
ecls <- ecls %>%
mutate(
wic = as.factor(wic)
)

# change the style of ggplot output
theme_set(theme_minimal())

## Questions:

1. First, let’s get a sense of the “treatment” assignment. How many WIC beneficiaries are there?

ecls %>%
count(wic) %>%
kable(
caption = "Breakdown of beneficiaries of WIC",
align   = "c"
) %>%
kable_styling(position = "center")
Table 1: Breakdown of beneficiaries of WIC
wic n
0 5513
1 2129

Out of 7642 students, approximately two thousand have received the WIC.

1. The WIC program was targeted at poor families. Does the data reflect that? Create a density plot of socioeconomic status for beneficiaries and non-benefiaries. How would you interpret these graphs to an audience of non-statisticians?

# socioeconomic status
ecls %>%
ggplot() +
geom_density(
aes(
x     = ses_comp,
group = wic,
col   = wic
)
)


# child weight
ecls %>%
ggplot() +
geom_density(
aes(
x     = birth_weight,
group = wic,
col   = wic
)
)

1. While have two outcomes of interest (student grades in reading and math in kindergarten) for simplicity we’ll only focus on their reading grades. Run a simple linear regression for the students’ reading grades against their WIC recipient status. Additionally, run a separate model with a full set of controls: female, age_k_fall, ses_comp, books, books_2 (books squared), birth_weight, mother_age. Display these models with stargazer, comparing the simple with the full model.

# baseline models

# with controls
read_irt_k_fall ~ wic + female + age_k_fall + ses_comp +
books + books_2 + birth_weight + mother_age,
data = ecls
)

stargazer(
type   = 'html'
)
 Dependent variable: read_irt_k_fall (1) (2) wic1 -4.809*** -1.110*** (0.219) (0.241) female 1.531*** (0.185) age_k_fall 0.427*** (0.021) ses_comp 2.809*** (0.156) books 0.039*** (0.007) books_2 -0.0001*** (0.00003) birth_weight 0.018*** (0.005) mother_age 0.223*** (0.021) Constant 25.908*** -16.006*** (0.116) (1.748) Observations 7,642 7,642 R2 0.059 0.189 Adjusted R2 0.059 0.188 Residual Std. Error 8.584 (df = 7640) 7.974 (df = 7633) F Statistic 482.177*** (df = 1; 7640) 222.460*** (df = 8; 7633) Note: p<0.1; p<0.05; p<0.01
1. Interpret the results above. What is the relationship between student test scores and WIC? Can we make a causal claim? Why or why not?

2. To address concerns with confounding, let’s perform matching across treatment and control groups (WIC and non-WIC). Include all predictors of the model with controls. Make sure that the caliper parameter is adjusted to 0.1.


ecls_m_out <- matchit(
formula = wic ~ female + age_k_fall + ses_comp +
books + books_2 + birth_weight + mother_age,
data    = ecls,
method  = "nearest",
caliper = 0.1
)

ecls_match <- match.data(ecls_m_out)
1. Run summary() on your matchit output. Look at the table under “Summary of balance for all data” and glance at the Means Treated and Means Control columns. Which covariates exhibit more balance (that is are similar in both treated and control conditions)? Are there covariates that seem far apart? Are some covariates already quite similar? Now look at the “Summary of balance for matched data.” Has there been improvement in balance?

summary(ecls_m_out)

Call:
matchit(formula = wic ~ female + age_k_fall + ses_comp + books +
books_2 + birth_weight + mother_age, data = ecls, method = "nearest",
caliper = 0.1)

Summary of balance for all data:
Means Treated Means Control SD Control  Mean Diff
distance            0.5228        0.1843     0.1974     0.3385
female              0.4880        0.4914     0.5000    -0.0034
age_k_fall         68.8792       68.8343     4.2887     0.0449
ses_comp           -0.2793        0.4748     0.6824    -0.7541
books              77.6289      104.2006    58.7065   -26.5717
books_2          9099.5510    14303.5959 14391.5616 -5204.0449
birth_weight      118.0521      121.9615    19.9567    -3.9094
mother_age         21.4998       26.6762     4.8849    -5.1765
eQQ Med  eQQ Mean   eQQ Max
distance        0.3811    0.3385 4.658e-01
female          0.0000    0.0033 1.000e+00
age_k_fall      0.0700    0.0802 4.500e+00
ses_comp        0.7500    0.7531 2.590e+00
books          20.0000   26.5552 1.000e+02
books_2      1875.0000 5202.6914 3.000e+04
birth_weight    3.0000    3.8920 2.100e+01
mother_age      6.0000    5.1738 7.000e+00

Summary of balance for matched data:
Means Treated Means Control SD Control Mean Diff eQQ Med
distance            0.4312        0.4161     0.2044    0.0150  0.0187
female              0.4875        0.4875     0.5000    0.0000  0.0000
age_k_fall         68.8645       68.8392     4.2520    0.0253  0.0700
ses_comp           -0.1478       -0.1083     0.4617   -0.0395  0.0300
books              86.0288       86.1727    56.5233   -0.1440  0.0000
books_2         10636.4702    10618.5835 13037.2908   17.8868  0.0000
birth_weight      120.1004      120.2943    20.4757   -0.1939  0.0000
mother_age         22.5905       22.6660     3.9743   -0.0755  0.0000
eQQ Mean  eQQ Max
distance       0.0151 2.60e-02
female         0.0000 0.00e+00
age_k_fall     0.0975 6.30e-01
ses_comp       0.0449 1.61e+00
books          1.7844 5.00e+01
books_2      289.7319 1.75e+04
birth_weight   0.6673 1.80e+01
mother_age     0.3788 5.00e+00

Percent Balance Improvement:
Mean Diff.  eQQ Med eQQ Mean  eQQ Max
distance        95.5597  95.1002  95.5437  94.4252
female         100.0000   0.0000 100.0000 100.0000
age_k_fall      43.7532   0.0000 -21.5376  86.0000
ses_comp        94.7635  96.0000  94.0362  37.8378
books           99.4582 100.0000  93.2805  50.0000
books_2         99.6563 100.0000  94.4311  41.6667
birth_weight    95.0412 100.0000  82.8543  14.2857
mother_age      98.5416 100.0000  92.6793  28.5714

Sample sizes:
Control Treated
All          5513    2129
Matched      1563    1563
Unmatched    3950     566
Discarded       0       0
1. Display the covariate balance plot using cobalt::love.plot() (see sample code above). Has the balance improved as a result of matching?

bal.tab(ecls_m_out)

Call
matchit(formula = wic ~ female + age_k_fall + ses_comp + books +
books_2 + birth_weight + mother_age, data = ecls, method = "nearest",
caliper = 0.1)

Balance Measures
distance     Distance   0.0681
female         Binary   0.0000
age_k_fall    Contin.   0.0059
ses_comp      Contin.  -0.0655
books         Contin.  -0.0025
books_2       Contin.   0.0013
birth_weight  Contin.  -0.0094
mother_age    Contin.  -0.0164

Sample sizes
Control Treated
All          5513    2129
Matched      1563    1563
Unmatched    3950     566

love.plot(ecls_m_out)

1. Re-run the analysis in (3). What differences do you note between these results and the ones you found before?

In unmatched data, the magnitude of our coefficient in the naive model is substantially larger than the coefficient in the model with controls suggesting that our estimate is driven non-trivially by the model we pick.

Notice, in the matched data, that moving from the naive model to the full model does not result in substantial changes in the magnitude of the estimate suggesting that our estimate is not model dependent.


# baseline models

# with controls
read_irt_k_fall ~ wic + female + age_k_fall + ses_comp +
books + books_2 + birth_weight + mother_age,
data = ecls_match
)

stargazer(
type = 'html',
)
 Dependent variable: read_irt_k_fall (1) (2) wic1 -1.299*** -1.185*** (0.269) (0.254) female 1.579*** (0.257) age_k_fall 0.383*** (0.030) ses_comp 2.385*** (0.268) books 0.059*** (0.009) books_2 -0.0002*** (0.00004) birth_weight 0.009 (0.006) mother_age 0.201*** (0.031) Constant 22.996*** -12.078*** (0.190) (2.425) Observations 3,126 3,126 R2 0.007 0.113 Adjusted R2 0.007 0.111 Residual Std. Error 7.510 (df = 3124) 7.106 (df = 3117) F Statistic 23.394*** (df = 1; 3124) 49.757*** (df = 8; 3117) Note: p<0.1; p<0.05; p<0.01

As further evidence that our estimate with the matched data is less model dependent (e.g., results determined by how we specify the model with controls, interactions, quadratics, etc.), here is a contrast of t.test() results with the unmatched and then the matched data. The t.test with matched data produces similar estimates to both the naive and the full regression models with matched data.


t.test(read_irt_k_fall ~ wic, data = ecls) %>%
broom::tidy() %>%
select(contains("estimate"),statistic, p.value, contains("conf")) %>%
kable(format = "html", digits = 2) %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE)
estimate estimate1 estimate2 statistic p.value conf.low conf.high
4.81 25.91 21.1 24.73 0 4.43 5.19

t.test(read_irt_k_fall ~ wic, data = ecls_match) %>%
broom::tidy() %>%
select(contains("estimate"),statistic, p.value, contains("conf")) %>%
kable(format = "html", digits = 2) %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE)
estimate estimate1 estimate2 statistic p.value conf.low conf.high
1.3 23 21.7 4.84 0 0.77 1.83
1. Given this analysis, what would you recommend to a policy-maker? Should we continue the WIC program or should we think about redesigning it? There is no “right” answer, use your critical thinking and use the statistical evidence to inform your opinion.

2. What are some limitations of matching we should keep in mind?

### Questions

• Regression vs Matching:
• Why isn’t regression enough to control for imbalance? First, we want to focus analysis on the area of common support (the subset of our data in which there is overlap between treated and control subjects). For example, including rich people in an analysis of anti-poverty programs is probably not meaningful as they would be “never takers” of the benefit. Another related idea is that in any data analysis we can have outliers for which there is no “twin” subject but which could exert significant influence on results. Regression will not address either of these issues and will just find the line that fits the data. Matching, on the other hand, has the potential to focus our analysis on the observations that could plausibly be either “treated” or “control” (area of common support) and prunes the extreme or outlying observations for which there aren’t meaningful comparisons or “twins.”
• Model dependence:
• Why didn’t the results didn’t change much between matched and unmatched regression? If results do not change much between unmatched and matched subsets, that suggests the estimated relationships are not being driven by imbalance in data. Put another way, matching can be seen as a kind of robustness check to see whether the results are sensitive to imbalance. In the WIC example above, adding controls shrunk the magnitude of the coefficient a lot between the naive and the “full” models suggesting that in the naive model, WIC is correlated with a lot of other covariates and the results are sensitive to the model. In the matched data, though, the coefficient on the naive model and the full model are quite similar. This shows us that matching has reduced model dependence. Just for comparison, we can see that a simple t.test() gives us a similar estimate to the full model, suggesting matching has reduced model dependence.
• Causality:
• Can we make causal claims? In general, no. While matching helps approximate experimental designs to make a causal claim we have to make assumptions that are often assumed to be heroic. In particular, we have to assume that matching on observed characteristics also results in matching on unobserved characteristics (e.g., unmeasured).