4 Residuals and Errors
Residuals are the difference between the observed value of \(Y_i\) (the point) and the predicted, or estimated value, for that point called \(\hat{Y_i}\). The errors are the true distances between the observed \(Y_i\) and the actual regression relation for that point, \(E\{Y_i\}\).
We will denote a residual for individual \(i\) by \(r_i\), \[ r_i = \underbrace{Y_i}_{\substack{\text{Observed} \\ \text{Y-value}}} - \underbrace{\hat{Y}_i}_{\substack{\text{Predicted} \\ \text{Y-value}}} \quad \text{(residual)} \] The residual \(r_i\) estimates the true error for individual \(i\), \(\epsilon_i\), \[ \epsilon_i = \underbrace{Y_i}_{\substack{\text{Observed} \\ \text{Y-value}}} - \underbrace{E\{Y_i\}}_{\substack{\text{True Mean} \\ \text{Y-value}}} \quad \text{(error)} \]
In summary…
Residual \(r_i\) | Error \(\epsilon_i\) |
---|---|
Distance between the dot \(Y_i\) and the estimated line \(\hat{Y}_i\) | Distance between the dot \(Y_i\) and the true line \(E\{Y_i\}\). |
\(r_i = Y_i - \hat{Y}_i\) | \(\epsilon_i = Y_i - E\{Y_i\}\) |
Known | Typically Unknown |
As shown in the graph below, the residuals are known values and they estimate the unknown (but true) error terms.
Keep in mind the idea that the errors \(\epsilon_i\) “created” the data and that the residuals \(r_i\) are computed after using the data to “re-create” the line.
Residuals have many uses in regression analysis. They allow us to
- diagnose the regression assumptions,
See the “Assumptions” section below for more details.
- estimate the regression relation,
See the “Estimating the Model Parameters” section below for more details.
- estimate the variance of the error terms,
See the “Estimating the Model Variance” section below for more details.
- and assess the fit of the regression relation.
See the “Assessing the Fit of a Regression” section below for more details.