Statistical Style Guide

Omar Wasow (Department of Politics, Princeton University)princeton.edu/politics
2021-04-19

Interpretation

Always explain your statistical findings with at least one clear sentence. Be attentive to the following:

• Test: what statistical test is being used (e.g., OLS, t-test for difference in means)?

• Sign: is the estimated “effect” positive or negative, increasing or decreasing?

• Magnitude: what is the estimated size of the “effect”? What is the difference in means or the relevant coefficient as in $$\hat{\beta}_1$$?

• Uncertainty: what is our estimated level of uncertainty? Typically reported with a p-value or confidence interval.

• Substance: What is the substantive interpretation of your results? Does this matter?

• Correlation or causation: Is the relationship between X and Y an association or is it causal? Avoid words that imply causation in the absence of true randomization or a clearly stated assumption of “as if” randomization.?

• Unit: what are the units for X and Y? Where possible use original units. If units are not easily interpreted, try: (1) just refer to generic “units,” (2) present result in terms of standard deviations or range of observed data or compared to prior scholarship, (3) consider helpful analogies. For example, “the effect of the intervention on math test scores is approximately equivalent to moving from the average student in Alabama to the average student in Massachusetts.”

Other considerations

• Digits: Are the results rounded to a reasonable level of precision (e.g., two or three digits)? Is scientific notation being used unnecessarily? Consider using options(digits = 2).

• Transformation: Might transformation make a term more interpretable? For example, dividing population by 10,000 so a coefficient is not 0.000. Does the estimate need to be back-transformed into original or more easily interpreted units? For example, exponentiating a logged variable such as income or exponentiating a coefficient from a logistic regression (e.g., from log odds to odds).

• Groups: Are there groups that can be named explicitly rather than referred to generically? For example: “moving from the subjects who received no job training to those who did” or “moving from the male to the female sample.”

• Conditioning: Are you controlling for other variables? If so, many scholars will report their estimated “effects” with the term ceteris paribus or something equivalent to emphasize that the relationship between X and Y is conditional on “all other terms are being held equal” or “other terms being held constant.”

• Phrasing: We regress Y on X as in lm(y ~ x, data=some_data). With causal claims, we use language like “the effect of X on Y.” With descriptive or correlational results we report something like “the association of X with Y.”

• Additive or multiplicative: Is the relationship between X and Y additive (as in most OLS) or multiplicative (as can occur when interpreting logged terms).

Reporting regression by distributions of X and Y

• Continuous $$X_1$$, continuous $$Y$$: a one unit change in $$X_1$$ is associated with/causes a $$\hat{\beta}_1$$ unit(s) change in $$Y$$, holding all other variables constant.

• Binary $$X_1$$, continuous $$Y$$: moving from group 1 to group 2 (e.g., from control to treated) is associated with/causes a $$\hat{\beta}_1$$ unit(s) change in $$Y$$, holding all other variables constant.

• Continuous $$X_1$$, binary $$Y$$: with logistic regression: A one unit change in $$X_1$$ is associated with/causes the odds that $$Y = 1$$ to change by a multiplicative factor of exp$$(\beta_1)$$, holding all other variables constant.

• Binary $$X_1$$, binary $$Y$$: with logistic regression: moving from group 1 to group 2 is associated with/causes the odds that $$Y = 1$$ to change by a multiplicative factor of exp$$(\beta_1)$$, holding all other variables constant.

Interpretation after log transformations

From The Statistical Sleuth, 3ed, “Interpretation After Log Transformations,” (p 216).

• When the Response Variable is Logged: If $$\mu\{\textrm{log}(Y)|X\} = \beta_0 + \beta_1X$$, then an increase in X of one unit is associated with a multiplicative change of exp($$\beta_1$$) in Median{$$Y|X$$}.

• When the Explanatory Variable is Logged: If $$\mu\{Y|\textrm{log}(X)\} = \beta_0 + \beta_1\textrm{log}(X)$$, then the relationship can be described in terms of multiplicative changes in X, either as a change in the mean of Y for each doubling of X or a change in the mean of Y for each ten-fold increase in X. The chosen multiple should be consistent with the range of X’s in the data set.

• When Both the Response and Explanatory Variables are Logged: If $$\mu\{\textrm{log}(Y)|\textrm{log}(X)\} = \beta_0 + \beta_1\textrm{log}(X)$$, then Median{$$Y|X$$} = exp($$\beta_0)X^{\beta_1}$$. A doubling of X is associated with a multiplicative change of $$2^{\beta_1}$$ in the median of Y. Or, a ten-fold increase in X is associated with a $$10^{\beta_1}$$-fold change in the median of Y.

Tables, figures & formulas

• Referencing tables and figures: Any plot or table used in the document should be referred to in text. For example, “Table 2 presents…” or “Figure 3 presents the relationship between…” In R Markdown, any plot chunk can be referred to as Figure \@ref(fig:chunk-label). With stargazer the label= option can create a reference as in:
library(stargazer)
stargazer(lm1, lm2, label="tab:summary_stats")

“The summary statistics are presented in Table \@ref(tab:summary_stats)

• Plots: use informative axes, legends and titles and/or captions.
• Tables: use informative row and column names, titles and/or captions.
• Model: where appropriate, write out the formula for your statistical model(s).

This supplement was put together by Omar Wasow. Some sections quoted from The Statistical Sleuth, 3ed, Ramsey and Schafer (2013). Feedback, corrections and suggestions welcome. Email owasow@princeton.edu.