1 Introduction

Simple linear regression is a fancy term for “putting a line on a scatterplot of data.” For example, the line shown on the scatterplot below is often called a “linear regression line.”

There is a rich mathematical theory behind linear regression. And experience has shown that the more deeply you connect with this theory, the more powerfully you can perform predictive data modeling and analysis.

The goal of this course is to introduce you to this rich theory from a high level perspective. The more math you have studied prior to the course, the more deeply you will be able to connect with this “rich theory” behind linear regression. Still, even if you have but little experience in mathematics, this course should help you connect with the theory deeply enough that you will be able to appropriately use linear regression in applied applications.

1.1 Regression Cheat Sheet

Past students have found the following table of terms, pronunciation guides, and so on to be useful reminders about things they learn in the course. You could skip over this table right now. It is most powerfully used by revisiting it as you work through each lesson in the course. Your eventual goal is to become very familiar with every element demonstrated within this table.


Term LaTeX Pronunciation Meaning Math Notation R Code
\(Y_i\) $Y_i$ “why-eye” The data \(Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \quad \text{where} \ \epsilon_i \sim N(0, \sigma^2)\) YourDataSet$YourYvariable
\(\hat{Y}_i\) $\hat{Y}_i$ “why-hat-eye” The fitted line \(\hat{Y}_i = b_0 + b_1 X_i\) lmObject$fitted.values
\(E\{Y_i\}\) $E\{Y_i\}$ “expected value of why-eye” True mean y-value \(E\{Y_i\} = \beta_0 + \beta_1 X_i\) <none>
\(\beta_0\) $\beta_0$ “beta-zero” True y-intercept <none> <none>
\(\beta_1\) $\beta_1$ “beta-one” True slope <none> <none>
\(b_0\) $b_0$ “b-zero” Estimated y-intercept \(b_0 = \bar{Y} - b_1\bar{X}\) b_0 <- coef(lmObject)[1]
\(b_1\) $b_1$ “b-one” Estimated slope \(b_1 = \frac{\sum X_i(Y_i - \bar{Y})}{\sum(X_i - \bar{X})^2}\) b_1 <- coef(lmObject)[2]
\(\epsilon_i\) $\epsilon_i$ “epsilon-eye” Distance of dot to true line \(\epsilon_i = Y_i - E\{Y_i\}\) <none>
\(r_i\) $r_i$ “r-eye” or “residual-eye” Distance of dot to estimated line \(r_i = Y_i - \hat{Y}_i\) lmObject$residuals
\(\sigma^2\) $\sigma^2$ “sigma-squared” Variance of the \(\epsilon_i\) \(Var\{\epsilon_i\} = \sigma^2\) <none>
\(MSE\) $MSE$ “mean squared error” Estimate of \(\sigma^2\) \(MSE = \frac{SSE}{n-p}\) sum( lmObject$res^2 ) / (n - p)
\(SSE\) $SSE$ “sum of squared error” (residuals) Measure of dot’s total deviation from the line \(SSE = \sum_{i=1}^n (Y_i - \hat{Y}_i)^2\) sum( lmObject$res^2 )
\(SSR\) $SSR$ “sum of squared regression error” Measure of line’s deviation from y-bar \(SSR = \sum_{i=1}^n (\hat{Y}_i - \bar{Y})^2\) sum( (lmObject$fit - mean(YourData$Y))^2 )
\(SSTO\) $SSTO$ “total sum of squares” Measure of total variation in Y \(SSR + SSE = SSTO = \sum_{i=1}^n (Y_i - \bar{Y})^2\) sum( (YourData$Y - mean(YourData$Y)^2 )
\(R^2\) $R^2$ “R-squared” Proportion of variation in Y explained by the regression \(R^2 = \frac{SSR}{SSTO} = 1 - \frac{SSE}{SSTO}\) summary(lmObject)$sigma^2
\(\hat{Y}_h\) $\hat{Y}_h$ “why-hat-aitch” Estimated mean y-value for some x-value called \(X_h\) \(\hat{Y}_h = b_0 + b_1 X_h\) predict(lmObject, data.frame(XvarName=#))
\(X_h\) $X_h$ “ex-aitch” Some x-value, not necessarily one of the \(X_i\) values used in the regression \(X_h =\) some number Xh = #
Confidence Interval <none> “confidence interval” Estimated bounds at a certain level of confidence for a parameter \(b_0 \pm t^* \cdot s_{b_0}\) or \(b_1 \pm t^* \cdot s_{b_1}\) confint(mylm, level = someConfidenceLevel)


Parameter Estimate
\(\beta_0\) \(b_0\)
\(\beta_1\) \(b_1\)
\(\epsilon_i\) \(r_i\)
\(\sigma^2\) \(MSE\)
\(\sigma\) \(\sqrt{MSE}\), the Residual standard error