Handout for Precept 2

Calculating pooled sample standard deviations.

Team pol346 pol346.com (Princeton University, Department of Politics)princeton.edu/politics

Table of Contents

Chapter 2, Problem 21.

Bumpus Natural Selection Data. In 1899, biologist Hermon Bumpus presented as evidence of natural selection a comparison of numerical characteristics of moribund house sparrows that were collected after an uncommonly severe winter storm and which had either perished or survived as a result of their injuries. Display 2.15 [see data below] shows the length of the humerus (arm bone) in inches for 59 of these sparrows, grouped according to whether they survived or perished. Analyze these data to summarize the evidence that the distribution of humerus lengths differs in the two populations. Write a brief paragraph of statistical conclusion, using the ones in Section 2.1 as a guide, including a graphical display, a conclusion about the degree of evidence of a difference, and a conclusion about the size of the difference in distributions.

bumpus <- ex0221 %>% clean_names()
bumpus %>% head(2) 

  humerus   status
1   0.687 Survived
2   0.703 Survived

Chapter 2, Problem 13.

Fish Oil and Blood Pressure. Researchers used 7 red and 7 black playing cards to randomly assign 14 volunteer males with high blood pressure to one of two diets for four weeks: a fish oil diet and a standard oil diet. The reductions in diastolic blood pressure are shown in Display 1.14 [not shown, see data below]. (Based on a study by H. R. Knapp and G. A. FitzGerald, “The Antihypertensive Effects of Fish Oil,” 320 (1989): 1037-43.) Do the following steps to compare the treatments.

study <- ex0112 %>% clean_names()
study %>% head(2)

  bp    diet
1  8 FishOil
2 12 FishOil
  1. Compute the averages and the sample standard deviations for each group separately.

  2. Compute the pooled estimate of standard deviation using the formula in Section 2.3.2. [See below]

Formula: Pooled estimate of standard deviation for two independent samples

\[ \begin{aligned} \textrm{We assume:} & \\ \\ \sigma_1 & = \sigma_2 = \sigma \\ s_1, s_2 & = \textrm{ independent estimates of }\sigma\\\\ \textrm{Therefore:} \\ \\ s_p & = \sqrt{\frac{(n_1 -1)s^2_1 + (n_2-1)s^2_2}{(n_1 + n_2 -2)} }\\ \textrm{d.f.} & = (n_1 + n_2 - 2) \end{aligned} \]

  1. Compute SE( \(\bar{Y_2} - \bar{Y_1}\) ) using the formula in Section 2.3.2. [See below]

Formula: Standard error for the difference

\[\textrm{SE}(\bar{Y}_2 - \bar{Y}_1) = s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2} }\] d. What are the degrees of freedom associated with the pooled estimate of standard deviation? What is the 97.5th percentile of the \(t\)-distribution with this many degrees of freedom?

  1. Construct a 95% confidence interval for \(\mu_2 - \mu_1\) using the formula in Section 2.3.3. [Set \(\alpha\) to \(0.05\)] \(100(1- \alpha)\) % :\((\bar{Y_2} - \bar{Y}_1) \pm t_{df}(1-\alpha/2)\textrm{SE}(\bar{Y_2} - \bar{Y}_1).\)

  2. Compute the \(t\)-statistic for testing equality as shown in Section 2.3.5. [Under null hypothesis, assume hypothesized value for \((\mu_2 - \mu_1) = 0\).]

\[ \begin{aligned} \textit{t-statistic} = \frac{(\bar{Y}_2 - \bar{Y}_1) - [\textit{Hypothesized value for } (\mu_2 - \mu_1)]}{\textrm{SE}(\bar{Y}_2 - \bar{Y}_1)} \end{aligned} \]

  1. Find the one-sided \(p\)-value (as evidence that the fish oil diet resulted in greater reduction of blood pressure) by comparing the \(t\)-statistic in (f) to the percentiles of the appropriate \(t\)-distribution (by reading the appropriate percentile from a computer program or calculator). [Sample code below.]

# Example: if t-statistic is 2.7 and degrees of freedom are 13
pt(2.7, 13)

[1] 0.990903

# for a t-distribution with 13 degrees of freedom, 99 percent of the area under 
# the curve is to the left of 2.7

# to calculate 1-sided p-value for probability of values equal to or 
# greater than 2.7 (with df = 13)
1 - pt(2.7, 13) 

[1] 0.009096983