# Get Your "all-else-held-equal" Odds-Ratio Story for Non-Linear Models!

Practical walkthroughs on machine learning, data exploration and finding insight.

**Resources**

On one hand we have tree-based classifiers and deep-belief networks and on the other, linear regression models. What the latter lacks in terms of coolness and precision certainly makes up in transparency and actionability. People just love their coefficients and odd ratios. Here is an approach to extract odds out of tree-based classifiers so you too can say 'all else held equal, a one unit change in x, will result in a pseudo-coefficient change of something in y'. The bonus here is that we capture non-linear movements - this can yield a lot of intelligence out of your variables!

The above chart is the **Pseudo Odds** for the amount of years worked at the Texas A&M University (source: The Texas Tribune). The outcome variable is whether an employee's salary is above the overall median salary. Though we will look at this chart in more detail shortly, you can say that your best odds of being a top earner is at 28 years of service or more precisely, you have 20% better odds of earning above the medium salary at 28 years of service. Pretty informative, right?

I didn't coin the term nor the algorithm, that came from my friend and colleague, Tristan Markwell, who agreed to share this with the world. My small contribution was to nail down the plots.

## How Does it Work?

For a high-level explanation, first train a classification model and feed testing data into it to extract probabilities for a particular binary outcome. Then, one feature at a time, shift each of its points one unit up and feed that into the model. This becomes your up-one-unit probabilities for that feature. Do the same but in the opposite direction by shifting the feature data down one unit. Finally, calculate the slope between the up-one-unit and down-one-unit probabilities. Smooth the slopes to simplify patterns and analyze away!

The code is a bit more complex and experimental in nature so bear with me. It is composed of four logical steps and I added charts throughout most steps from a feature called ‘YearsWorked’ off of the Texas A&M salary data set. The outcome variable is whether a particular employee is above the median salary.

## Step 1: Base Model

We first model the data using XGBoost (but any classification model will do) and get a prediction vector using a test data set.

## Step 2: Average Slopes for each Feature Points

We then loop through each feature and call the ‘get_avg_slope’ function. As its name implies, it returns average slopes for each feature by shifting the feature values up and down a unit in the testing data and returns the slopes between the shifted points. In more detail:

- Pass original prediction vector
- Get unique values in feature
- Figure out jump size: take 1/20th of the number of values (if there are 100 unique values, you go 5 to the right, and 5 to left)
- Shift each value in the feature up one jump size and predict this new set while all other features held constant
- Do the same down one jump size and predict this new set while all other features held constant
- Save the two resulting vectors of shifted predictions
- Take the slope between the up point and down point
- If you hit a wall then use half way point
- Return the average of all slopes per value

Here is a raw plot of the pseudo odds ratios for YearsWorked:

## Step 3: Odd Ratio Rectangles

Calculate the odd-ratio rectangles. Each rectangle captures the speed of the move on using the smoothed odd ratio points. The rectangle is drawn by capturing each movement up one standard deviation from the median and back down one standard deviation mark. The height of the rectangle represents the speed at which the feature moves across its units.

Here is a plot of the smoothed pseudo odds ratios for YearsWorked with both standard deviation lines shown:

Here is a smoothed plot of the pseudo odds ratios for YearsWorked with directional boxes from the standard deviation crossovers: