Year Lag1 Lag2 Lag3
Min. :2001 Min. :-4.922000 Min. :-4.922000 Min. :-4.922000
1st Qu.:2002 1st Qu.:-0.639500 1st Qu.:-0.639500 1st Qu.:-0.640000
Median :2003 Median : 0.039000 Median : 0.039000 Median : 0.038500
Mean :2003 Mean : 0.003834 Mean : 0.003919 Mean : 0.001716
3rd Qu.:2004 3rd Qu.: 0.596750 3rd Qu.: 0.596750 3rd Qu.: 0.596750
Max. :2005 Max. : 5.733000 Max. : 5.733000 Max. : 5.733000
Lag4 Lag5 Volume Today
Min. :-4.922000 Min. :-4.92200 Min. :0.3561 Min. :-4.922000
1st Qu.:-0.640000 1st Qu.:-0.64000 1st Qu.:1.2574 1st Qu.:-0.639500
Median : 0.038500 Median : 0.03850 Median :1.4229 Median : 0.038500
Mean : 0.001636 Mean : 0.00561 Mean :1.4783 Mean : 0.003138
3rd Qu.: 0.596750 3rd Qu.: 0.59700 3rd Qu.:1.6417 3rd Qu.: 0.596750
Max. : 5.733000 Max. : 5.73300 Max. :3.1525 Max. : 5.733000
Direction
Down:602
Up :648
# Create a dataframe for data browsingsm=Smarket# Bivariate Plot of inter-lag correlationspairs(Smarket,col=Smarket$Direction,cex=.5, pch=20)
Direction
glm.pred Down Up
Down 145 141
Up 457 507
mean(glm.pred==Direction)
[1] 0.5216
# Make training and test set for predictiontrain = Year<2005glm.fit=glm(Direction~Lag1+Lag2+Lag3+Lag4+Lag5+Volume,data=Smarket,family=binomial, subset=train)glm.probs=predict(glm.fit,newdata=Smarket[!train,],type="response") glm.pred=ifelse(glm.probs >0.5,"Up","Down")Direction.2005=Smarket$Direction[!train]table(glm.pred,Direction.2005)
Direction.2005
glm.pred Down Up
Down 77 97
Up 34 44
Direction.2005
glm.pred Down Up
Down 35 35
Up 76 106
mean(glm.pred==Direction.2005)
[1] 0.5595238
# Check accuracy rate106/(76+106)
[1] 0.5824176
# Can you interpret the results?
Answer:
Result Interpretation
2 a. In LDA, predictor variables are assumed to be normally distributed with equal variance, and the response variable must be categorical, as LDA is tailored for classification tasks.
2 b. LDA is suitable for multiclass classification, while logistic regression is typically used for binary classification. Additionally, LDA imposes distributional assumptions on the data, unlike logistic regression.
2 c. The Receiver Operating Characteristic (ROC) curve assesses the performance of a classifier model.
2 d. Sensitivity measures the true positive rate, while specificity measures the true negative rate. Sensitivity is often prioritized due to the severe consequences of false positives, such as wrongful convictions. Both type I and II errors are undesirable, but high sensitivity is crucial.
2 e. Based on the chart, sensitivity is deemed more critical for prediction.