The truth is almost never linear!

- Or almost never!

But often the linearity assumption is “good enough”

What about when its not?

Polynomials

Step Functions

Splines

Local Regression

Generalized Additive Models

- All of these models offer a lot of flexibility, without losing the ease and interpretability of linear models

- \(y_i = B_0 + B_1x_1 + B_2x_{i}^{2} + B_3x_{i}^{3} + ... + B_dx_{i}^{d} + \epsilon_i\)

Create new variables \(X_1 = X\), \(X_2 = X^2\), and so on, then treat as multiple linear regression

Not really interested in the coefficients; more interested in the fitted function values at any value \(x_0\):

- \(\hat{f}(x_0) = \hat{\beta_0} + \hat{\beta_1}x_0 + \hat{\beta_2}x_0^2 + \hat{\beta_3}x_3 + \hat{\beta_4}x_4\)

Since \(\hat{f}(x_0)\) is a linear function of the \(\hat{\beta_\ell}\), can get a simple expression for pointwise-variances \(Var[\hat{f}(x_0)]\) at any value of \(x_0\). In the figure above, we have computed the fit and pointwise standard errors on a grid of values for \(x_0\). We show \(\hat{f}(x_0) \pm 2 \cdot se[\hat{f}(x_0)]\)

We either fix the degree \(d\) at some reasonably low value, else use cross-validation to choose \(d\)

Logistic regression follows naturally. For example, in the figure we model:

\(Pr(y_i > 250|x_i) = \frac{exp(B_0 + B_1x_1 + B_2x_{i}^{2} + B_3x_{i}^{3} + ... + B_dx_{i}^{d})}{1 + exp(B_0 + B_1x_1 + B_2x_{i}^{2} + B_3x_{i}^{3} + ... + B_dx_{i}^{d})}\)

To get confidence intervals, compute upper and lower bounds on on the logit scale, and then invert to get on probability scale

Can do separately on several variables—just stack the variables into one matrix, and separate out the pieces afterwards (see GAMs later)

Caveat: polynomials have notorious tail behavior — very bad for extrapolation

Can fit using \(y ~ poly(x, degree = 3)\) in formula

Another way of creating transformations of a variable — cut the variable into distinct regions

- \(C_1(X) = I(X < 35), C_2(X) = I(35 \leq X < 50), ... , C_3(X) = I(X \geq 65)\)