## Training RMSE for the OLS full regression model

EXERCISE 4

Purpose: To learn how to use XLMINER to “validate” Principal Component (PC) regressions using the Validation Data Approach.  In particular we are going to be looking at various PC Regressions to determine how many Principal Components we should retain in our regression and to see if the chosen dimension reduction provides better predictive performance than OLS regression with the full set of inputs.  In this exercise, you are to use a standard partition with seed = 12345 and a 60%, 40% split between the training data set and the validation (i.e. test) data set.  The data we are going to be using is the Boston Housing data.  In the PC model we will be constructing PCs from the standardized versions of all of the input variables except for the indicator variable CHAS.  In PC analysis we usually use only numeric inputs not categorical ones.  With this background in mind and following my demonstration in class you are to compute the following parts of this exercise.

a) Training RMSE for the OLS full regression model = ______________.

b) Training RMSE for the PC regression model with 3 PCs = __________________.

c) Training RMSE for the PC regression model with 5 PCs = ________________.

d) Which of the above models do you prefer?  Explain your answer.

e) Hand in your EXCEL XMINER spreadsheet that completes the above computations.