Governments across the developed and developing world often implement public policies seeking to improve the welfare of those citizens most in need. The WIC (Women, Infants and Children) program was designed to assist “low-income pregnant, breastfeeding, and non-breastfeeding postpartum women, and to infants and children up to age five who are found to be at nutritional risk.”1

How effective are these policies at improving social welfare? This is difficult to answer: first we must decide what are the relevant social indicators that we will measure. Second, there is the problem of confounding. Mothers and children who receive WIC will systematically differ from those who don’t. In particular, they will be on average poorer. So, for example, there may be dietary deficiencies that will affect the outcome of interest. How can we address that?

One approach would be to run randomized controlled trials (RCTs). This raises ethical questions, however, about whether governments should deliberately withhold benefits from citizens in the control group. An alternative, given the absence of randomization, is matching. Through matching, we try to achieve with observational data what randomization aims to achieve by design: treatment and control groups that are identical to one another except for the treatment assignment itself (on average). To do so, we will use two packages: MatchIt and cobalt.


Here is a short working example with the car database mtcars. Suppose we are interested in the relationship between a “treatment,” vs, denoting the type of engine (V-shaped or straight) and our outcome of interest, mpg, miles-per-gallon. We are concerned, though, that there are systematic differences between the two vs types. For example, if we compare the distribution of horsepower in the two types, we see substantial areas of non-overlap:

mtcars %>% 
  mutate(vs = factor(vs)) %>% 
  ggplot() + 
    aes(x = hp, color = vs) + 

To compare “apples to apples,” we want to find observations that are similar on observed covariates. To do this, we match on other covariates that we know will affect mpg: cyl, hp and wt. We specify the treatment on the left hand side (LHS), and the covariates we want to balance on the right hand side (RHS): vs ~ cyl + hp + wt in the formula argument.

Note that we specify method = "nearest" and caliper = 0.1. These indicate that the algorithm will find the nearest neighbors to our treatment observation, within a caliper (tolerance parameter) of 0.1. Do not worry too much about it for now.2

mtcars_m_out <- matchit(
  formula = vs ~ cyl + hp + wt,
  data    = mtcars,
  method  = "nearest",
  caliper = 0.1

We then evaluate the matching process with commands like summary(matchit_output). Looking at the output below, what do you notice about the mean values for “treated” and “control” units in all the data versus the matched data?

## Call:
## matchit(formula = vs ~ cyl + hp + wt, data = mtcars, method = "nearest", 
##     caliper = 0.1)
## Summary of Balance for All Data:
##          Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
## distance        0.8666        0.1037          4.4059     0.5263    0.4698   0.9444
## cyl             4.5714        7.4444         -3.0642     0.6659    0.4788   0.7778
## hp             91.3571      189.7222         -4.0273     0.1642    0.3853   0.8333
## wt              2.6113        3.6886         -1.5067     0.6255    0.3251   0.6111
## Summary of Balance for Matched Data:
##          Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
## distance        0.7243        0.6889          0.2043     1.0460    0.1290      0.5          0.2043
## cyl             4.0000        5.0000         -1.0665     0.0000    0.1667      0.5          1.0665
## hp             89.5000      100.5000         -0.4504     6.1191    0.0455      0.5          0.5732
## wt              1.8565        2.5075         -0.9105     0.8737    0.1724      0.5          0.9944
## Percent Balance Improvement:
##          Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
## distance            95.4       93.0      72.5     47.1
## cyl                 65.2        N/A      65.2     35.7
## hp                  88.8       -0.3      88.2     40.0
## wt                  39.6       71.2      47.0     18.2
## Sample Sizes:
##           Control Treated
## All            18      14
## Matched         2       2
## Unmatched      16      12
## Discarded       0       0

To convert the matchit object to a data.frame, we use:

mtcars_match <-

To check for covariate balance:

cobalt::love.plot(mtcars_m_out, abs = TRUE)


In this exercise, we will analyze a subset of white families from the ECLS (Early Childhood Longitudinal Study). We will focus on a set of family characteristics and the treatment, whether or not the child’s mother was a beneficiary of the WIC program. The outcome of interest is the student’s performance in reading and math tests at kindergarten.



# load data
ecls <- read.csv(url("")) %>% 

# treatment variable: convert to factor
ecls <- ecls %>% 
    wic = as.factor(wic)

# change the style of ggplot output


  1. First, let’s get a sense of the “treatment” assignment. How many WIC beneficiaries are there?

  2. The WIC program was targeted at poor families. Does the data reflect that? Create a density plot of socioeconomic status for beneficiaries and non-benefiaries. How would you interpret these graphs to an audience of non-statisticians?

  3. While have two outcomes of interest (student grades in reading and math in kindergarten) for simplicity we’ll only focus on their reading grades. Run a simple linear regression for the students’ reading grades against their WIC recipient status. Additionally, run a separate model with a full set of controls: female, age_k_fall, ses_comp, books, books_2 (books squared), birth_weight, mother_age. Display these models with stargazer, comparing the simple with the full model.

  4. Interpret the results above. What is the relationship between student test scores and WIC? Can we make a causal claim? Why or why not?

  5. To address concerns with confounding, let’s perform matching across treatment and control groups (WIC and non-WIC). Include all predictors of the model with controls. Make sure that the caliper parameter is adjusted to 0.1.

  6. Run summary() on your matchit output. Look at the table under “Summary of balance for all data” and glance at the Means Treated and Means Control columns. Which covariates exhibit more balance (that is are similar in both treated and control conditions)? Are there covariates that seem far apart? Are some covariates already quite similar? Now look at the “Summary of balance for matched data.” Has there been improvement in balance?

  7. Display the covariate balance plot using cobalt::love.plot() (see sample code above). Has the balance improved as a result of matching?

  8. Re-run the analysis in (3). What differences do you note between these results and the ones you found before?

  9. Given this analysis, what would you recommend to a policy-maker? Should we continue the WIC program or should we think about redesigning it? There is no “right” answer, use your critical thinking and use the statistical evidence to inform your opinion.

  10. What are some limitations of matching we should keep in mind?

  1. For more information, see ↩︎

  2. But if you really do, you can read this excellent Wikipedia article on matching through propensity score (here)[].↩︎