5 min read

Machine Learning - Predictions with ordinal logistic regression

library(tidyverse)
library(ROCR)
library(MASS)
library(caret)
library(nnet)

Table of Content

  • 1 Introduction
  • 2 Load and prepare the dataset
  • 3 Prepare a training and test set
  • 4 Some descriptive key figures
  • 5 Conclusion

1 Introduction

In my last publication, “Machine Learning - Predictions with Generalized Linear Models”, the topic was discussed, how classifications can be made using generalized linear regressions. In the following, for example, a classification of an ordinal-scaled dependent variable shall be made.

For this post the dataset Wine+Quality from the UCI- Machine Learning Repository platform “UCI”) was used.

2 Load and prepare the dataset

Under the link above, two CSV files are stored. One for red wines and one for white whines. For the following examination, both data sets are loaded and combined into one.

red <- read.csv(url("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"), sep = ";")
white <- read.csv(url("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"), sep = ";")
wine <- rbind(red, white)
glimpse(wine)
## Observations: 6,497
## Variables: 12
## $ fixed.acidity        <dbl> 7.4, 7.8, 7.8, 11.2, 7.4, 7.4, 7.9, 7.3, ...
## $ volatile.acidity     <dbl> 0.700, 0.880, 0.760, 0.280, 0.700, 0.660,...
## $ citric.acid          <dbl> 0.00, 0.00, 0.04, 0.56, 0.00, 0.00, 0.06,...
## $ residual.sugar       <dbl> 1.9, 2.6, 2.3, 1.9, 1.9, 1.8, 1.6, 1.2, 2...
## $ chlorides            <dbl> 0.076, 0.098, 0.092, 0.075, 0.076, 0.075,...
## $ free.sulfur.dioxide  <dbl> 11, 25, 15, 17, 11, 13, 15, 15, 9, 17, 15...
## $ total.sulfur.dioxide <dbl> 34, 67, 54, 60, 34, 40, 59, 21, 18, 102, ...
## $ density              <dbl> 0.9978, 0.9968, 0.9970, 0.9980, 0.9978, 0...
## $ pH                   <dbl> 3.51, 3.20, 3.26, 3.16, 3.51, 3.51, 3.30,...
## $ sulphates            <dbl> 0.56, 0.68, 0.65, 0.58, 0.56, 0.56, 0.46,...
## $ alcohol              <dbl> 9.4, 9.8, 9.8, 9.8, 9.4, 9.4, 9.4, 10.0, ...
## $ quality              <int> 5, 5, 5, 6, 5, 5, 5, 7, 7, 5, 5, 5, 5, 5,...

For simplicity I will contract the original output variable (quality) to a three point scale from 0 to 2.

wine$quality <- factor(ifelse(wine$quality < 5, 0, ifelse(wine$quality > 6, 2, 1)))

3 Prepare a training and test set

set.seed(7644)
wine_sampling_vector <- createDataPartition(wine$quality, p = 0.80, list = FALSE)
wine_train <- wine[wine_sampling_vector,]
wine_test <- wine[-wine_sampling_vector,]


wine_model <- polr(quality ~ ., data = wine_train, Hess = T)
summary(wine_model)
## Call:
## polr(formula = quality ~ ., data = wine_train, Hess = T)
## 
## Coefficients:
##                           Value Std. Error   t value
## fixed.acidity         2.348e-01   0.037587    6.2462
## volatile.acidity     -3.212e+00   0.298033  -10.7778
## citric.acid          -1.694e-01   0.326327   -0.5191
## residual.sugar        1.330e-01   0.009488   14.0204
## chlorides            -1.843e+00   1.363609   -1.3518
## free.sulfur.dioxide   1.514e-02   0.002996    5.0523
## total.sulfur.dioxide -5.786e-03   0.001078   -5.3674
## density              -1.804e+02   0.531508 -339.3469
## pH                    1.355e+00   0.258374    5.2457
## sulphates             2.111e+00   0.264873    7.9694
## alcohol               6.359e-01   0.036506   17.4192
## 
## Intercepts:
##     Value     Std. Error t value  
## 0|1 -170.1699    0.5449  -312.2988
## 1|2 -164.5075    0.5674  -289.9221
## 
## Residual Deviance: 5668.121 
## AIC: 5694.121

Our model summary shows us that we have three output classes and we have two intercepts. Let’s count the the number of samples by the output score and then express these as relative frequencies.

prop.table(table(wine$quality))
## 
##          0          1          2 
## 0.03786363 0.76558412 0.19655225

Class 1, which corresponds to average wines, is by far the most requent. In fact, a simple baseline model that always predicts this category would be correct 76,56% of the time.

4 Some descriptive key figures

Now, let’s look at the fit on the training data and the corresponding confusion matrix.

wine_predictions <- predict(wine_model, wine_train)
mean(wine_predictions == wine_train$quality)
## [1] 0.7814964
table(predicted = wine_predictions,actual = wine_train$quality)
##          actual
## predicted    0    1    2
##         0    0    0    0
##         1  193 3834  793
##         2    4  146  229

Our model performs only marginally better on the training data than our baseline model. We can see why this is the case: it predicts the average class (1) very often.

Now we want to try this again with the test set.

wine_test_predictions <- predict(wine_model, wine_test)
mean(wine_test_predictions == wine_test$quality)
## [1] 0.779661
table(predicted = wine_test_predictions, actual = wine_test$quality)
##          actual
## predicted   0   1   2
##         0   0   2   0
##         1  49 959 202
##         2   0  33  53

As you can see, we get a pretty much identical situation. It seems that our model is not a particularly good choise for this dataset. Probably we have to check if is wheather the proportional odds assumption is valid. One simple test that is easy to do, however, is to train a second model using multinomial logistic regression. Then we can compare the AIC value of our two models.

wine_model2 <- multinom(quality ~ ., data = wine_train, maxit = 1000)
## # weights:  39 (24 variable)
## initial  value 5711.685289 
## iter  10 value 3616.269493
## iter  20 value 2797.070604
## iter  30 value 2782.640097
## iter  40 value 2778.448036
## iter  50 value 2767.613495
## iter  50 value 2767.613493
## final  value 2767.613493 
## converged
wine_predictions2 <- predict(wine_model2, wine_test)
mean(wine_predictions2 == wine_test$quality)
## [1] 0.7858243
table(predicted = wine_predictions2, actual = wine_test$quality)
##          actual
## predicted   0   1   2
##         0   1   1   0
##         1  48 949 185
##         2   0  44  70

Now we have to check their AIC values:

AIC(wine_model)
## [1] 5694.121
AIC(wine_model2)
## [1] 5583.227

5 Conclusion

The AIC is lower in the multinomial logistic regression model, which suggests that we might be better of working with that model.

Source

Miller, J. D., & Forte, R. M. (2017). Mastering Predictive Analytics with R. Packt Publishing Ltd.