Log transformations and bee handout with partial solutions

As part of a study to investigate reproductive strategies in plants, biologists recorded the time spent at sources of pollen and the proportions of pollen removed by bumble-bee queens and honeybee workers pollinating a species of lily. (Data from L. D. Harder and J. D. Thompson, “Evolutionary Options for Maximizing Pollen Dispersal of Animal-pollinated Plants,” *American Naturalist* 133 (1989): 323-44.) Their data appear in Display 3.12.

- Draw side-by-side box plots (or histograms) of the proportion of pollen removed by queens and workers.

```
pollen_removed duration_of_visit bee_type
1 0.07 2 Queen
2 0.10 5 Queen
3 0.11 7 Queen
4 0.12 11 Queen
5 0.15 12 Queen
6 0.19 11 Queen
```

- When the measurement is the proportion \(P\) of some amount, one useful transformation is log\([P/(1 - P)]\). This is the log of the ratio of the proportion removed to the proportion not removed. Draw side-by-side box plots or histograms on this transformed scale.

- Test whether the distribution of proportions removed is the same or different for the two groups, using the \(t\)-test on the transformed data.

```
Welch Two Sample t-test
data: pollen_logprop by bee_type
t = -3.9744, df = 20.249, p-value = 0.0007322
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.7504675 -0.5460731
sample estimates:
mean in group Queen mean in group Worker
-0.3812734 0.7669968
```

- Draw side-by-side box plots of duration of visit on the natural scale,

- the logarithmic scale, and

- the reciprocal scale.

Which of the three scales seems most appropriate for use of the \(t\)-tools?

Compute a 95% confidence interval to describe the difference in means on the chosen scale.

What are relative advantages of the three scales as far as interpretation goes?

Based on your experience with this problem, comment on the difficulty in assessing equality of population standard deviations from small samples.