# Predicting Multiple Discrete Values with Multinomials, Neural Networks and the {nnet} Package

Practical walkthroughs on machine learning, data exploration and finding insight.

**Resources**

**Packages Used in this Walkthrough**

**{nnet}**- neural network multinomial modeling**{RCurl}**- downloads https data**{caret}**- dummyVars and postResample function

So, what is a **multionmial model**?

From Wikipedia:

Multinomial logistic regression is a simple extension of binary logistic regression that allows for more than two categories of the dependent or outcome variable.

And from the `multinom`

**{nnet}** help file:

```
library(nnet)
?multinom
```

Fits multinomial log-linear models via neural networks.

In a nutshell, this allows you to predict a factor of multiple levels (more than two) in one shot with the power of neural networks. **Neural networks** are great at working through multiple combinations and also great with linear models, so it’s an ideal combination.

If your data is linear in nature, then instead of using multiple models and doing `A`

versus `B`

, `B`

versus `C`

, and `C`

versus `A`

, and finally going through the hassle of concatenating the resulting probabilities, you can let **nnet** do it all in one shot. And this becomes exponentialy more difficult as you predict more than 3 outcome levels!!

The `multinom`

function will do all that for you in one shot and allow you to observe the probabilities of each subset to interpret things (now that’s really cool).

**Let’s code!**

We’re going to use a Hadley Wickham data set to predict how many cylinders a vehicle has. We download the data from Github:

```
library(RCurl)
urlfile <-'https://raw.githubusercontent.com/hadley/fueleconomy/master/data-raw/vehicles.csv'
x <- getURL(urlfile, ssl.verifypeer = FALSE)
vehicles <- read.csv(textConnection(x))
```

Only use the first 24 columns of the data for simplicities sake. Cast all variables to numerics and impute any NAs with `0`

.

```
vehicles <- vehicles[names(vehicles)[1:24]]
vehicles[is.na(vehicles)] <- 0
names(vehicles)
```

```
## [1] "barrels08" "barrelsA08" "charge120"
## [4] "charge240" "city08" "city08U"
## [7] "cityA08" "cityA08U" "cityCD"
## [10] "cityE" "cityUF" "co2"
## [13] "co2A" "co2TailpipeAGpm" "co2TailpipeGpm"
## [16] "comb08" "comb08U" "combA08"
## [19] "combA08U" "combE" "combinedCD"
## [22] "combinedUF" "cylinders" "displ"
```

Use the `cyclinder`

column as the model’s outcome and cast it to a factor. Use the `table`

function to see how many types of cylinders we are dealing with (BTW a `0`

cylinder vehicle is an electric vehicle):

```
vehicles$cylinders <- as.factor(vehicles$cylinders)
table(vehicles$cylinders)
```

```
##
## 0 2 3 4 5 6 8 10 12 16
## 66 51 182 13133 757 12101 7715 138 481 7
```

We see that the 4 and 6 cylinder vehicles are the most numerous.

Shuffle the data and split it into two equal data frames so we can have a training and a testing data set:

```
set.seed(1234)
vehicles <- vehicles[sample(nrow(vehicles)),]
split <- floor(nrow(vehicles)/2)
vehiclesTrain <- vehicles[0:split,]
vehiclesTest <- vehicles[(split+1):nrow(vehicles),]
```

Let’s put **nnet** to work and predict cyclinders. The `maxiter`

variable defaults to 100 when omitted so let’s start with a large number during the first round to make sure we find the lowest possible error level (i.e. global minimum - solution with the lowest error possible):

```
library(nnet)
cylModel <- multinom(cylinders~., data=vehiclesTrain, maxit=500, trace=T)
```

```
## # weights: 250 (216 variable)
## initial value 39869.260885
## iter 10 value 18697.133750
...
## iter 420 value 5217.401201
## final value 5217.398483
## converged
```

When you see the word **converged** in the log output, you know the model went as far as it could.

Let’s find the most influential variables by using **caret’s** `varImp`

function:

```
library(caret)
mostImportantVariables <- varImp(cylModel)
mostImportantVariables$Variables <- row.names(mostImportantVariables)
mostImportantVariables <- mostImportantVariables[order(-mostImportantVariables$Overall),]
print(head(mostImportantVariables))
```

```
## Variables Overall
## charge240 625.5732
## cityUF 596.4079
## combinedUF 580.1112
## displ 434.8038
## cityE 395.3533
## combA08 322.2910
```

Manuel Amunategui - Follow me on Twitter: @amunategui