In this lab, you will begin to get oriented with R and work with some data.

Exercise 1. (3 points)

In this exercise, you will further analyze the Wage data set.

  1. Perform polynomial regression to predict wage using age. Use cross-validation to select the optimal degree d for the polynomial. What degree was chosen, and how does this compare to the results of hypothesis testing using ANOVA? Make a plot of the resulting polynomial fit to the data.

  2. Fit a step function to predict wage using age, and perform cross-validation to choose the optimal number of cuts. Make a plot of the fit obtained.

#insert code here


Exercise 2. (3 points)

This question relates to the College data set.

  1. Split the data into a training set and a test set. Using out-of-state tuition as the response and the other variables as the predictors, perform forward stepwise selection on the training set in order to identify a satisfactory model that uses just a subset of the predictors.

  2. Fit a GAM on the training data, using out-of-state tuition as the response and the features selected in the previous step as the predictors. Plot the results, and explain your findings.

  3. Evaluate the model obtained on the test set, and explain the results obtained.

  4. For which variables, if any, is there evidence of a non-linear relationship with the response?

#insert code here