Precept 5

ANOVA Calculations - Step By Step

Federico H. Tiberti pol346.com (Princeton University, Department of Politics)princeton.edu/politics
2021-03-03

Introduction

In this document we will perform an ANOVA analysis to test a hypothesis of difference across multiple samples. To do so, we will rely on a built-in dataset in R, Chicken Weights. This dataset contains data on chicken weights and feed type for 71 observations. First, we load the packages and the data:

chicken <- chickwts
head(chicken)
  weight      feed
1    179 horsebean
2    160 horsebean
3    136 horsebean
4    227 horsebean
5    217 horsebean
6    168 horsebean
There are six different categories of feed type:
janitor::tabyl(chicken$feed, sort = TRUE, show_na = FALSE) %>% 
  knitr::kable(col.names = c("Feed", "n", "%"), align = "c", digits = 2)
Feed n %
casein 12 0.17
horsebean 10 0.14
linseed 12 0.17
meatmeal 11 0.15
soybean 14 0.20
sunflower 12 0.17

A simple boxplot gives us an idea about the distribution of weight by feed type.

plot(weight~feed, data=chicken)

The research question we can address with this data is whether feed type has an effect on chicken weight. If it had an effect, we should observe significant difference in weight across different feed types. To try and answer that question, an ANOVA approach is well suited, as it allows us to compare multiple samples without collapsing the data to a pairwise comparison. Our null hypothesis is that the mean weight across all six feed types is the same:

\[\mu_{casein} = \mu_{horsebean} = \mu_{linseed} = \mu_{meatmeal} = \mu_{soybean} = \mu_{sunflower}\]

Conceptually, the ANOVA test evaluates how much of the explanatory power we gain from fitting more means within the sample (with respect to just fitting a grand mean). In other words, the F-test determines whether the variability between group means is larger than the variability of the observations within the groups. If that ratio (F-statistic) is sufficiently large, we can reject the null hypothesis that all means are equal.

To calculate this ratio, we need to calculate the extra sum of square residuals, ie the difference between total sum of squares when fitting the reduced (equal means) model versus the full (separate means) model. The code below computes those residuals for each observation.

chicken <- chicken %>% 
  mutate(
    # creates the grand mean
    mean_weight = mean(weight)
    )  

chicken <- chicken %>% 
  # groups observations by feed category
  group_by(feed) %>%    
    # creates means by group
    mutate(feed_means = mean(weight)) %>% 
  ungroup() 

chicken <- chicken %>% 
  mutate(
    # calculate squared residual from group mean
    residual_within  = (weight - feed_means)^2,   
    
    # calculate the individual "extra square residual"
    residual_between = (feed_means - mean_weight)^2, 
    
    # calculate squared residual from grand mean
    residual_total   = (weight - mean_weight)^2
    )  

head(chicken)
# A tibble: 6 x 7
  weight feed  mean_weight feed_means residual_within residual_between
   <dbl> <fct>       <dbl>      <dbl>           <dbl>            <dbl>
1    179 hors…        261.       160.        353.               10223.
2    160 hors…        261.       160.          0.0400           10223.
3    136 hors…        261.       160.        586.               10223.
4    227 hors…        261.       160.       4462.               10223.
5    217 hors…        261.       160.       3226.               10223.
6    168 hors…        261.       160.         60.8              10223.
# … with 1 more variable: residual_total <dbl>

The extra sum of square residuals is calculated below:

total_ss  <- sum(chicken$residual_total) 
total_ss
[1] 426685.2
within_ss <- sum(chicken$residual_within)
within_ss
[1] 195556
extra_ss <- total_ss - within_ss
extra_ss
[1] 231129.2

In a similar way, we get the extra degrees of freedom as the difference in dfs between the full and reduced models:

total_df  <- nrow(chicken) - 1
total_df
[1] 70
number_of_groups <- length(unique(chicken$feed))
number_of_groups
[1] 6
within_df <- nrow(chicken) - number_of_groups
within_df
[1] 65
extra_df  <- total_df - within_df
extra_df
[1] 5

Conveniently, having already calculated the sum of square residuals and the degrees of freedom of the full model allows us to easily compute the pooled estimate of the variance from the full model:

pooled_variance <- within_ss / within_df
pooled_variance
[1] 3008.554

So now it’s just a matter of calculating the mean square (dividing the extra sum of squares by the extra degrees of freedom) and scaling it by the pooled estimate of the variance to get to the F-statistic:

mean_square <- extra_ss / extra_df
mean_square
[1] 46225.83
f_stat <- mean_square / pooled_variance
f_stat
[1] 15.3648

We can see how extreme such a value is by plotting the distribution of the F-statistic under the null. In this case, the null distribution would be an F distribution with 5 degrees of freedom in the numerator (the “extra degrees of freedom”) and 65 degrees of freedom in the denominator (the “full model” degrees of freedom). Below, we plot the distribution:

# create a high "penalty" for scientific notation
options(scipen = 10)

# visualize F-distribution with df1 = 5, and df2 = 65
# not F-stat of 15 is way out in the tail
visualize::visualize.f(stat = round(f_stat), df1 = extra_df, df2 = within_df, section = "upper")

We can also use that F distribution to calculate the p-value of our F-statistic:

# create a high "penalty" for scientific notation
options(scipen = 10)

# NOTE: setting lower.tail = FALSE gives us right tail
pf(f_stat, 5, 65, lower.tail = FALSE) 
[1] 0.000000000593642

The p-value is extremely low, therefore we can reject the null hypothesis that all the means are equal. To check the robustness of our manual calculation, we can use the built-in aov() function:

aov(weight ~ feed, data = chicken) %>% 
summary()
            Df Sum Sq Mean Sq F value         Pr(>F)    
feed         5 231129   46226   15.37 0.000000000594 ***
Residuals   65 195556    3009                           
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Which yields the same F-statistic and p-value of our manual calculation.