## Overview

Creating your own OLS function should be completed without using base `R`

statistical functions like `sd()`

and `var()`

. Basic math functions like `mean`

, `sqrt`

and `length`

are fine.

### Calculating \(\beta_0\) and \(\beta_1\)

Recall the formulas for calculating the least squares estimators \(\hat{\beta_0}\) and \(\hat{\beta_1}\) in the equation \(\hat{y_i} = \hat{\beta_0} + \hat{\beta_1}x_i\):

\[
\hat{\beta_1} = \frac{S_{xy}}{S_{xx}} \textrm{ where}
\]

\[
S_{xy} = \sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y}) \textrm{ and }
\]

\[
S_{xx} = \sum_{i=1}^{n}(x_i - \bar{x})^2
\]

\[
\hat{\beta_0} = \bar{y} - \beta_1{\bar{x}}
\]

### Calculating \(\hat{y_i}\)

Once we have the coefficients and , we can use those estimates to create an equation that, with our observed values for \(x\), estimates predicted or fitted (i.e., fitting the regression line) values \(\hat{y}\):

\[
\hat{y_i} = \hat{\beta_0} + \hat{\beta_1} \times x_i
\]

### Calculating \(y_i - \hat{y_i}\)

Once we have \(\hat{y_i}\), it is possible to calculate the residuals or the difference between the observed \(y_i\) and the fitted \(\hat{y_i}\):

\[
\textrm{residuals} = y_i - \hat{y_i}
\]

### Calculating \(\hat{\sigma}\)

Now, with the residuals, we can calculate \(\hat{\sigma}\) the sample standard deviation:

\[
\hat{\sigma} = \sqrt{\frac{\sum_i^n((y_i - \hat{y_i}) ^ 2)}{df} }, \textrm{where } df = n - \textrm{number of } \beta \textrm{ terms}
\]

### Calculating \(s_X^2\)

We also need to calculate \(s_X^2\) the sample variance of \(x\).

\[
\textrm{Var}(x) = \frac{\sum_i^n((x_i - \bar{x}) ^ 2)}{(n - 1)}
\]

### Calculating SE(\(\beta_0\)) and SE(\(\beta_1\))

Now we have all the terms we need to calculate the standard errors. Display 7.7 of the *Statistical Sleuth* presents the formulas for calculating standard errors for the intercept and slope estimates with simple linear regression. As we do not know the true population standard deviation \(\sigma\), we estimate a sample standard deviation \(\hat{\sigma}\) and substitute for \(\sigma\) as in the formulas below.

\[
\operatorname{SE}\left(\hat{\beta}_{1}\right)=\hat{\sigma} \sqrt{\frac{1}{(n-1) s_{X}^{2}}}, \quad \text { d.f. }=n-2
\]

and

\[
\operatorname{SE}\left(\hat{\beta}_{0}\right)=\hat{\sigma} \sqrt{\frac{1}{n}+\frac{\overline{X}^{2}}{(n-1) s_{X}^{2}}}, \quad \text { d.f. }=n-2
\]

This supplement was put together by Omar Wasow. Please email any questions, corrections or concerns to owasow@princeton.edu.