4 Residuals and Errors

\(r_i\), the residual, estimates \(\epsilon_i\), the true error…

Residuals are the difference between the observed value of \(Y_i\) (the point) and the predicted, or estimated value, for that point called \(\hat{Y_i}\). The errors are the true distances between the observed \(Y_i\) and the actual regression relation for that point, \(E\{Y_i\}\).

We will denote a residual for individual \(i\) by \(r_i\), \[ r_i = \underbrace{Y_i}_{\substack{\text{Observed} \\ \text{Y-value}}} - \underbrace{\hat{Y}_i}_{\substack{\text{Predicted} \\ \text{Y-value}}} \quad \text{(residual)} \] The residual \(r_i\) estimates the true error for individual \(i\), \(\epsilon_i\), \[ \epsilon_i = \underbrace{Y_i}_{\substack{\text{Observed} \\ \text{Y-value}}} - \underbrace{E\{Y_i\}}_{\substack{\text{True Mean} \\ \text{Y-value}}} \quad \text{(error)} \]

In summary…

Residual \(r_i\) Error \(\epsilon_i\)
Distance between the dot \(Y_i\) and the estimated line \(\hat{Y}_i\) Distance between the dot \(Y_i\) and the true line \(E\{Y_i\}\).
\(r_i = Y_i - \hat{Y}_i\) \(\epsilon_i = Y_i - E\{Y_i\}\)
Known Typically Unknown

As shown in the graph below, the residuals are known values and they estimate the unknown (but true) error terms.

Keep in mind the idea that the errors \(\epsilon_i\) “created” the data and that the residuals \(r_i\) are computed after using the data to “re-create” the line.

Residuals have many uses in regression analysis. They allow us to

  1. diagnose the regression assumptions,

See the “Assumptions” section below for more details.

  1. estimate the regression relation,

See the “Estimating the Model Parameters” section below for more details.

  1. estimate the variance of the error terms,

See the “Estimating the Model Variance” section below for more details.

  1. and assess the fit of the regression relation.

See the “Assessing the Fit of a Regression” section below for more details.