Yi Wang :)
  • Home
  • About
  • Research
  • Project Proposal
  • Résumé
  • Assignments
    • Assignment 1
    • Assignment 2
    • Assignment HTML
    • Lab 1
    • Lab 2
    • Lab 3
    • Assignment 3
    • Assignment 4
    • Assignment 5
    • Assignment 6
    • Assignment 7
    • Assignment 8

On this page

  • Load necessary libraries
  • Read the dataset into R
  • Data Preprocessing
    • Calculate Missing Values
    • Examine Unique Values
    • Histogram of ‘Tondu’
    • Check Binary Distribution
    • Convert Numerics to Factors
    • Impute Missing Values
  • Analysis
    • Frequency Table for ‘Tondu_with_no_response’
    • Ordinal Logistic Regression
    • Goodman and Kruskal’s Gamma Coefficient
    • Chi-Square Test of Independence
    • Logistic Regression
    • More Chi-Square Tests
  • Visualization
    • Bar Chart for ‘Tondu_with_no_response’
    • Boxplot of Age by Tondu
    • Bar Chart of Gender Vote by Tondu
    • Bar Chart of Gender by Vote for Tsai
  • Results and Discussion
  • Results and Discussion

Analysis of TEDS_2016 Dataset

Load necessary libraries

library(haven) # For reading Stata data files  
Warning: package 'haven' was built under R version 4.3.2
library(MASS)  # For ordinal logistic regression  
Warning: package 'MASS' was built under R version 4.3.2
library(vcd)   # For categorical data  
Warning: package 'vcd' was built under R version 4.3.2
Loading required package: grid

Read the dataset into R

TEDS_2016 <- read_stata("https://github.com/datageneration/home/blob/master/DataProgramming/data/TEDS_2016.dta?raw=true")  

Data Preprocessing

Calculate Missing Values

na_count <- sapply(TEDS_2016, function(y) sum(is.na(y)))  
na_count_votetsai <- sum(is.na(TEDS_2016$votetsai))  

Examine Unique Values

unique(TEDS_2016$Tondu)  
<labelled<double>[7]>: Position on unification and independence
[1] 3 5 9 4 6 2 1

Labels:
 value                                                              label
     1                                              Immediate unification
     2                    Maintain the status quo,move toward unification
     3 Maintain the status quo, decide either unification or independence
     4                                    Maintain the status quo forever
     5                   Maintain the status quo,move toward independence
     6                                             Immediate independence
     9                                                        Nonresponse
summary(TEDS_2016$Tondu)  
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   3.000   4.000   4.127   5.000   9.000 

Histogram of ‘Tondu’

hist(TEDS_2016$Tondu, main = "Histogram of Tondu", xlab = "Tondu")  

Check Binary Distribution

table(TEDS_2016$female)  

  0   1 
868 822 

Convert Numerics to Factors

TEDS_2016$female <- factor(TEDS_2016$female, levels = c(0, 1), labels = c("Male", "Female"))  
TEDS_2016$DPP <- factor(TEDS_2016$DPP)  
   
binary_columns <- sapply(TEDS_2016, function(x) all(x %in% c(0, 1, NA)))  
TEDS_2016[, binary_columns] <- lapply(TEDS_2016[, binary_columns], function(x) factor(x, levels = c(0, 1)))  

Impute Missing Values

# Numerical variables  
numerical_columns <- sapply(TEDS_2016, is.numeric) & !binary_columns  
TEDS_2016[, numerical_columns] <- lapply(TEDS_2016[, numerical_columns], function(x) {  
  ifelse(is.na(x), mean(x, na.rm = TRUE), x)  
})  
   
# Categorical variables  
categorical_columns <- sapply(TEDS_2016, is.factor)  
TEDS_2016[, categorical_columns] <- lapply(TEDS_2016[, categorical_columns], function(x) {  
  freq_table <- table(x)  
  mode_value <- names(which.max(freq_table))  
  x[is.na(x)] <- mode_value  
  factor(x, levels = levels(x))  
})  

Analysis

Frequency Table for ‘Tondu_with_no_response’

TEDS_2016$Tondu_numeric <- as.numeric(TEDS_2016$Tondu)
TEDS_2016$Tondu_with_no_response <- factor(TEDS_2016$Tondu_numeric,  
                                           levels = c(1, 2, 3, 4, 5, 6, 9),  
                                           labels = c("Immediate unification",  
                                                      "Maintain the status quo, move toward unification",  
                                                      "Maintain the status quo, decide either unification or independence",  
                                                      "Maintain the status quo forever",  
                                                      "Maintain the status quo, move toward independence",  
                                                      "Immediate independence",  
                                                      "No response"))  
tondu_freq_table <- table(TEDS_2016$Tondu_with_no_response)  
tondu_freq_table  

                                             Immediate unification 
                                                                27 
                  Maintain the status quo, move toward unification 
                                                               180 
Maintain the status quo, decide either unification or independence 
                                                               546 
                                   Maintain the status quo forever 
                                                               328 
                 Maintain the status quo, move toward independence 
                                                               380 
                                            Immediate independence 
                                                               108 
                                                       No response 
                                                               121 

Ordinal Logistic Regression

TEDS_2016 <- subset(TEDS_2016, !is.na(Tondu) & Tondu != 9 & !is.na(votetsai))  
TEDS_2016$Tondu <- factor(TEDS_2016$Tondu, levels = c(1, 2, 3, 4, 5, 6),  
                          labels = c("Immediate unification",  
                                     "Maintain the status quo, move toward unification",  
                                     "Maintain the status quo, decide either unification or independence",  
                                     "Maintain the status quo forever",  
                                     "Maintain the status quo, move toward independence",  
                                     "Immediate independence"),  
                          ordered = TRUE)  
ordinal_model <- polr(Tondu ~ female + DPP + age + income + edu + Taiwanese + Econ_worse,  
                      data = TEDS_2016)  

Goodman and Kruskal’s Gamma Coefficient

gamma_age <- assocstats(table(TEDS_2016$age, TEDS_2016$Tondu_numeric))$gamma  
gamma_income <- assocstats(table(TEDS_2016$income, TEDS_2016$Tondu_numeric))$gamma  
gamma_edu <- assocstats(table(TEDS_2016$edu, TEDS_2016$Tondu_numeric))$gamma  
gamma_results <- list(age_gamma = gamma_age, income_gamma = gamma_income, edu_gamma = gamma_edu)  

Chi-Square Test of Independence

TEDS_2016$DPP <- factor(TEDS_2016$DPP)  
TEDS_2016$Taiwanese <- factor(TEDS_2016$Taiwanese)  
TEDS_2016$Econ_worse <- factor(TEDS_2016$Econ_worse)  
   
chi_square_DPP <- chisq.test(table(TEDS_2016$DPP, TEDS_2016$Tondu), simulate.p.value = TRUE)  
chi_square_Taiwanese <- chisq.test(table(TEDS_2016$Taiwanese, TEDS_2016$Tondu), simulate.p.value = TRUE)  
chi_square_Econ_worse <- chisq.test(table(TEDS_2016$Econ_worse, TEDS_2016$Tondu), simulate.p.value = TRUE)  
chi_square_results <- list(DPP = chi_square_DPP, Taiwanese = chi_square_Taiwanese, Econ_worse = chi_square_Econ_worse)  

Logistic Regression

TEDS_2016 <- subset(TEDS_2016, !is.na(votetsai))  
logistic_model <- glm(votetsai ~ female + DPP + age + income + edu + Taiwanese + Econ_worse,  
                      family = binomial(link = "logit"), data = TEDS_2016)  
summary(logistic_model)  

Call:
glm(formula = votetsai ~ female + DPP + age + income + edu + 
    Taiwanese + Econ_worse, family = binomial(link = "logit"), 
    data = TEDS_2016)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.490842   0.433296   3.441  0.00058 ***
femaleFemale -0.404225   0.133554  -3.027  0.00247 ** 
DPP1          2.963318   0.259074  11.438  < 2e-16 ***
age          -0.023759   0.005080  -4.677 2.91e-06 ***
income       -0.003661   0.025452  -0.144  0.88563    
edu          -0.175804   0.060354  -2.913  0.00358 ** 
Taiwanese1    1.038279   0.135487   7.663 1.81e-14 ***
Econ_worse1   0.358628   0.132925   2.698  0.00698 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1898.4  on 1568  degrees of freedom
Residual deviance: 1398.2  on 1561  degrees of freedom
AIC: 1414.2

Number of Fisher Scoring iterations: 6

More Chi-Square Tests

chi_square_female <- chisq.test(table(TEDS_2016$female, TEDS_2016$votetsai), simulate.p.value = TRUE)  
chi_square_DPP <- chisq.test(table(TEDS_2016$DPP, TEDS_2016$votetsai), simulate.p.value = TRUE)  
chi_square_results_votetsai <- list(female = chi_square_female, DPP = chi_square_DPP)  

Visualization

Bar Chart for ‘Tondu_with_no_response’

barplot(tondu_freq_table,  
        main = "Frequency of Tondu Categories (Including 'No response')",  
        xlab = "Tondu Categories",  
        ylab = "Frequency",  
        las = 2,  
        col = "blue")  

Boxplot of Age by Tondu

boxplot(age ~ Tondu, data = TEDS_2016, main = "Boxplot of Age by Tondu", xlab = "Tondu", ylab = "Age")  

Bar Chart of Gender Vote by Tondu

barplot(table(TEDS_2016$female, TEDS_2016$Tondu_numeric),  
        beside = TRUE,  
        legend = c("Male", "Female"),  
        main = "Bar Chart of Gender Vote by Tondu",  
        xlab = "Tondu",  
        ylab = "Count",  
        args.legend = list(title = "Gender", x = "topright", cex = 0.8))  

Bar Chart of Gender by Vote for Tsai

barplot(table(TEDS_2016$female, TEDS_2016$votetsai),  
        beside = TRUE,  
        legend = c("Male", "Female"),  
        main = "Bar Chart of Gender by Vote for Tsai",  
        xlab = "Vote for Tsai",  
        ylab = "Count",  
        args.legend = list(title = "Gender", x = "topright", cex = 0.8))  

Results and Discussion

Results and Discussion

Analysis of the results: - (Intercept): The estimated log-odds of votetsai being 1 (versus 0) when all predictors are at their reference levels is 1.490842. This is statistically significant with a p-value of 0.00058.

  • femaleFemale: Being female is associated with a decrease in the log-odds of votetsai by -0.404225 compared to being male (the reference category). This effect is statistically significant (p = 0.00247).

  • DPP1: Affiliation with DPP (Democratic Progressive Party) is associated with an increase in the log-odds of votetsai by 2.963318 compared to non-affiliation (the reference category). This is highly statistically significant (p < 2e-16).

  • age: Each additional year of age is associated with a decrease in the log-odds of votetsai by -0.023759. This effect is statistically significant (p = 2.91e-06).

  • income: The coefficient for income is not statistically significant (p = 0.88563), suggesting that income does not have a significant effect on the log-odds of votetsai.

  • edu: Higher education levels are associated with a decrease in the log-odds of votetsai by -0.175804. This effect is statistically significant (p = 0.00358).

  • Taiwanese1: Identifying as Taiwanese is associated with an increase in the log-odds of votetsai by 1.038279. This effect is highly statistically significant (p = 1.81e-14).

  • Econ_worse1: Believing the economy has gotten worse is associated with an increase in the log-odds of votetsai by 0.358628. This effect is statistically significant (p = 0.00698).

The model’s AIC (Akaike Information Criterion) is 1414.2, which can be used for model comparison purposes. The lower the AIC, the better the model fits the data while penalizing for complexity.

The null deviance and residual deviance indicate how well the model fits the data compared to a null model with only the intercept. The significant reduction from the null deviance to the residual deviance shows that the predictors improve the model fit.

Overall, the model suggests that gender, DPP affiliation, age, education level, Taiwanese identity, and perception of the economy are significant predictors of votetsai. Income is not a significant predictor in this model.

```

  • Copyright 2024, Yi Wang