Calculating Simple Linear Regression

Supplement to pol346

Team 346 pol346.com (Department of Politics, Princeton University)http://princeton.edu/politics
2019-05-12

Table of Contents


Overview

Creating your own OLS function should be completed without using base R statistical functions like sd() and var(). Basic math functions like mean, sqrt and length are fine.

Calculating \(\beta_0\) and \(\beta_1\)

Recall the formulas for calculating the least squares estimators \(\hat{\beta_0}\) and \(\hat{\beta_1}\) in the equation \(\hat{y_i} = \hat{\beta_0} + \hat{\beta_1}x_i\):

\[ \hat{\beta_1} = \frac{S_{xy}}{S_{xx}} \textrm{ where} \]

\[ S_{xy} = \sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y}) \textrm{ and } \]

\[ S_{xx} = \sum_{i=1}^{n}(x_i - \bar{x})^2 \]

\[ \hat{\beta_0} = \bar{y} - \beta_1{\bar{x}} \]

Calculating \(\hat{y_i}\)

Once we have the coefficients and , we can use those estimates to create an equation that, with our observed values for \(x\), estimates predicted or fitted (i.e., fitting the regression line) values \(\hat{y}\):

\[ \hat{y_i} = \hat{\beta_0} + \hat{\beta_1} \times x_i \]

Calculating \(y_i - \hat{y_i}\)

Once we have \(\hat{y_i}\), it is possible to calculate the residuals or the difference between the observed \(y_i\) and the fitted \(\hat{y_i}\):

\[ \textrm{residuals} = y_i - \hat{y_i} \]

Calculating \(\hat{\sigma}\)

Now, with the residuals, we can calculate \(\hat{\sigma}\) the sample standard deviation:

\[ \hat{\sigma} = \sqrt{\frac{\sum_i^n((y_i - \hat{y_i}) ^ 2)}{df} }, \textrm{where } df = n - \textrm{number of } \beta \textrm{ terms} \]

Calculating \(s_X^2\)

We also need to calculate \(s_X^2\) the sample variance of \(x\).

\[ \textrm{Var}(x) = \frac{\sum_i^n((x_i - \bar{x}) ^ 2)}{(n - 1)} \]

Calculating SE(\(\beta_0\)) and SE(\(\beta_1\))

Now we have all the terms we need to calculate the standard errors. Display 7.7 of the Statistical Sleuth presents the formulas for calculating standard errors for the intercept and slope estimates with simple linear regression. As we do not know the true population standard deviation \(\sigma\), we estimate a sample standard deviation \(\hat{\sigma}\) and substitute for \(\sigma\) as in the formulas below.

\[ \operatorname{SE}\left(\hat{\beta}_{1}\right)=\hat{\sigma} \sqrt{\frac{1}{(n-1) s_{X}^{2}}}, \quad \text { d.f. }=n-2 \]

and

\[ \operatorname{SE}\left(\hat{\beta}_{0}\right)=\hat{\sigma} \sqrt{\frac{1}{n}+\frac{\overline{X}^{2}}{(n-1) s_{X}^{2}}}, \quad \text { d.f. }=n-2 \]


This supplement was put together by Omar Wasow. Please email any questions, corrections or concerns to .