Assignment 8

require(ISLR)

Loading required package: ISLR

Warning: package 'ISLR' was built under R version 4.3.3

require(MASS)

Loading required package: MASS

Warning: package 'MASS' was built under R version 4.3.2

require(descr)

Loading required package: descr

Warning: package 'descr' was built under R version 4.3.3

attach(Smarket)

## Linear Discriminant Analysis
freq(Direction)

Direction 
      Frequency Percent
Down        602   48.16
Up          648   51.84
Total      1250  100.00

train = Year<2005
lda.fit=lda(Direction~Lag1+Lag2,data=Smarket, subset=Year<2005)
lda.fit

Call:
lda(Direction ~ Lag1 + Lag2, data = Smarket, subset = Year < 
    2005)

Prior probabilities of groups:
    Down       Up 
0.491984 0.508016 

Group means:
            Lag1        Lag2
Down  0.04279022  0.03389409
Up   -0.03954635 -0.03132544

Coefficients of linear discriminants:
            LD1
Lag1 -0.6420190
Lag2 -0.5135293

plot(lda.fit, col="lightgreen")

Smarket.2005=subset(Smarket,Year==2005) # Creating subset with 2005 data for prediction
lda.pred=predict(lda.fit,Smarket.2005)
names(lda.pred)

[1] "class"     "posterior" "x"

lda.class=lda.pred$class
Direction.2005=Smarket$Direction[!train] 
table(lda.class,Direction.2005)

         Direction.2005
lda.class Down  Up
     Down   35  35
     Up     76 106

data.frame(lda.pred)[1:5,]

     class posterior.Down posterior.Up         LD1
999     Up      0.4901792    0.5098208  0.08293096
1000    Up      0.4792185    0.5207815  0.59114102
1001    Up      0.4668185    0.5331815  1.16723063
1002    Up      0.4740011    0.5259989  0.83335022
1003    Up      0.4927877    0.5072123 -0.03792892

table(lda.pred$class,Smarket.2005$Direction)

      
       Down  Up
  Down   35  35
  Up     76 106

mean(lda.pred$class==Smarket.2005$Direction)

[1] 0.5595238

ANSWER 1:

While the best subset selection model may have the smallest training RSS, stepwise selection models, particularly those using test RSS as a criterion, aim to find models that generalize well to unseen data. The choice between forward and backward models depends on the dataset and should be evaluated using appropriate validation techniques.

Best Subset Selection Model:

Best subset selection involves fitting all possible combinations of predictors and selecting the model with the lowest training RSS. Since it considers all possible subsets of predictors, it tends to find the model that fits the training data best among all the candidates. However, selecting the best subset based solely on training RSS can lead to overfitting, especially with a large number of predictors, as the model may capture noise in the data.

Stepwise Selection Models:

Stepwise selection methods, such as forward and backward selection, involve adding or removing predictors iteratively based on certain criteria (e.g., AIC, BIC, adjusted R-squared) until a stopping criterion is met.

While stepwise selection methods aim to find a parsimonious model that generalizes well to unseen data, they typically use techniques like cross-validation to estimate the performance of the selected model on test data. Therefore, the model selected through stepwise selection often has the smallest test RSS, indicating better generalization performance compared to the best subset selection model.

Forward vs. Backward Models:

In forward selection, predictors are added one by one to the model until no significant improvement in the chosen criterion is observed. This approach may result in a model that is simpler and potentially more interpretable.

In backward elimination, all predictors are initially included in the model, and one by one, the least significant predictors are removed until no further improvement is observed. This method may lead to a more parsimonious model, especially when dealing with a large number of predictors.

The choice between forward and backward selection may vary depending on the dataset characteristics, such as the correlation between predictors and the signal-to-noise ratio. It’s essential to assess the performance of both methods through cross-validation or other validation techniques to determine which one yields the best test RSS for the specific dataset.

set.seed(123)
x <- rnorm(100)
eps <- rnorm(100)

y <- 4 + 9 * x + 2 * x^2 + x^3 + eps

plot(x)

plot(y)

require(leaps)

Loading required package: leaps

Warning: package 'leaps' was built under R version 4.3.3

best_subset <- regsubsets(y ~ poly(x, 10, raw = T), data = data.frame(y,x, nvmax = 10))
bic <- summary(best_subset)$bic
cp <- summary(best_subset)$cp
adjr2 <- summary(best_subset)$adjr2

plot(bic, type = "b", pch = 16, col = "cyan", 
     xlab = "Model Number", ylab = "BIC Value", 
     main = "BIC Values for Different Models")

plot(cp, type = "b", pch = 16, col = "cyan", 
     xlab = "Model Number", ylab = "Cp Value", 
     main = "Cp Values for Different Models")

plot(adjr2, type = "b", pch = 16, col = "cyan", 
     xlab = "Model Number", ylab = "Adjusted R^2 Value", 
     main = "Adjusted R^2 Values for Different Models")

which.min(bic)

[1] 3

which.min(cp)

[1] 3

which.max(adjr2)

[1] 7

coef(best_subset, id = 3)

          (Intercept) poly(x, 10, raw = T)1 poly(x, 10, raw = T)2 
             3.970394              8.920446              1.908457 
poly(x, 10, raw = T)3 
             1.020436

Answer 2:

Model 3 is the best as it has the lowest BIC and Cp values. However, Model 7, with the highest adjusted R squared value, might be overfitting as the R squared plot levels off at Model 3. In Model 3, the coefficients are: intercept (3.97), B1 (8.92), B2 (1.91), and B3 (1.02).

for_subset <- regsubsets(y ~ poly(x, 10, raw = T), data = data.frame(y,x, nvmax = 10), method = "forward")

plot(summary(for_subset)$bic, type = "b", pch = 16, col = "orange", 
     xlab = "Model Number", ylab = "BIC Value", 
     main = "BIC Values for Different Models")

plot(summary(for_subset)$cp, type = "b", pch = 16, col = "pink", 
     xlab = "Model Number", ylab = "Cp Value", 
     main = "Cp Values for Different Models")

plot(summary(for_subset)$adjr2, type = "b", pch = 16, col = "steelblue", 
     xlab = "Model Number", ylab = "Adjusted R^2 Value", 
     main = "Adjusted R^2 Values for Different Models")

which.min(summary(for_subset)$bic)

[1] 3

which.min(summary(for_subset)$cp)

[1] 3

which.max(summary(for_subset)$adjr2)

[1] 4

coef(for_subset, id = 3)

          (Intercept) poly(x, 10, raw = T)1 poly(x, 10, raw = T)2 
             3.970394              8.920446              1.908457 
poly(x, 10, raw = T)3 
             1.020436

Answer 3

Model 3 is the best again, having the lowest BIC and Cp values. While Model 4 has the highest adjusted R squared value, Model 3 shows a similar value in the plot. In Model 3, the coefficients are: intercept (3.97), B1 (8.92), B2 (1.91), and B3 (1.02).

bac_subset <- regsubsets(y ~ poly(x, 10, raw = T), data = data.frame(y,x, nvmax = 10), method = "backward")

plot(summary(bac_subset)$bic, type = "b", pch = 16, col = "cyan", 
     xlab = "Model Number", ylab = "BIC Value", 
     main = "BIC Values for Different Models")

plot(summary(bac_subset)$cp, type = "b", pch = 16, col = "lightgreen", 
     xlab = "Model Number", ylab = "Cp Value", 
     main = "Cp Values for Different Models")

plot(summary(bac_subset)$adjr2, type = "b", pch = 16, col = "magenta", 
     xlab = "Model Number", ylab = "Adjusted R^2 Value", 
     main = "Adjusted R^2 Values for Different Models")

which.min(summary(bac_subset)$bic)

[1] 4

which.min(summary(bac_subset)$cp)

[1] 4

which.max(summary(bac_subset)$adjr2)

[1] 4

coef(bac_subset, id = 3)

          (Intercept) poly(x, 10, raw = T)1 poly(x, 10, raw = T)2 
            3.9620068             9.8934015             1.9689642 
poly(x, 10, raw = T)5 
            0.1748705

Answer 3:

Model 4, using backwards stepwise selection, outperforms others across all three metrics. The coefficients are as follows: intercept (3.96), B1 (9.89), B2 (1.97), and B3 (0.17).