About Manuel Amunategui

Data scientist with over 20-years experience in the tech industry, MAs in Predictive Analytics and International Administration, co-author of Monetizing Machine Learning and VP of Data Science at SpringML.

From consulting in machine learning, healthcare modeling, 6 years on Wall Street in the financial industry, and 4 years at Microsoft, I feel like I’ve seen it all. And this has opened my eyes to the huge gap in educational material on applied data science. Like I say:

It just ain’t real 'til it reaches your customer’s plate

I am a startup advisor and available for speaking engagements with companies and schools on topics around building and motivating data science teams, and all things applied machine learning.

Reach me at amunategui@gmail.com

Data Exploration & Machine Learning, Hands-on

Recommended free walkthrough, check it out and boost your career:

R and Azure ML - Your One-Stop Modeling Pipeline in The Cloud!

Practical walkthroughs on machine learning, data exploration and finding insight.

R and Azure ML - Cloud Pipeline

Resources

YouTube Companion Video

At the risk of being accused of only using Amazon Web Services, here is a look at modeling using Microsoft Azure Machine Learning Studio along with the R programming language. It is chock-full of data munging, modeling, and delivery functions!

If you want to follow along, sign up at AzureML for a free account.

Part 1 - Simple Demo with the Adult Census Income Binary Classification Dataset

Sign into your account at https://studio.azureml.net and follow the following steps:

Click EXPERIMENTS in the left horizontal menu bar:

experiments

Click + NEW sign at bottom left of screen to start a new experiment

new button

Choose Blank experiment:

Time to drag-and-drop modeling modules onto the workspace

Select Saved Datasets then Samples and drag Adult Census Income Binary Classification dataset onto your workspace:

save datasets

Your workspace should look like:

experiments created

You can right click on the data set to visualize the data.

Select Data Transformation then Sample and Split and drag Split Datat onto your workspace:

split data

And connect both rectangles together:

so far

Select Machine Learning then Train and drag Train Model onto your workspace:

train

And connect both rectangles as such:

so far

You'll notice the red exclamation point in the last rectangle - this is because you need to tell it what feature is your outcome variable. So click on the red exclamation point and in the right menu pane choose 'income' as outcome:

outcome

Select Machine Learning then Initialize Model then Classification and drag Two-Class Logistic Regression onto your workspace:

classification

And connect it to the top left of Train Model:

so far

Select Machine Learning then Score and drag Score Model onto your workspace:

score

Connect Train Model to top left Score Model and connect Split Data to top right of Score Model. Fun connecting all this, right?:

so far

Select Machine Learning then Evaluate and drag Evaluate Model onto your workspace:

evaluate

Connect bottom Score Model to top left Evaluate Model:

so far

Finally, hit RUN button at bottom middle of Azure ML Studio page:

run it

Right click on Evaluate Model and click Evaluation Results - Visualize

scores

more scores

Pretty simple and easy to use, right? Play around with the Threshold slider - great to understand the flexibility and cost of the AUC score.

Hi there, this is Manuel Amunategui- if you're enjoying the content, find more at ViralML.com

A More Complex Example Using R

OK – well and done, but what about R? Let's bring in R into AzureML all the while leveraging the pipeline service. Start a new experiment and drag an Execute R Script into the workspace central pane:

languages

Click on the Execute R Script rectangle and, in the right pane, replace the current R code with this:


 
titanic_df <- read.csv('http://math.ucdenver.edu/RTutorial/titanic.txt',sep='\t', stringsAsFactors = FALSE)

# creating new title feature
titanic_df$Title <- ifelse(grepl('Mr ',titanic_df$Name),'Mr',ifelse(grepl('Mrs ',titanic_df$Name),'Mrs',ifelse(grepl('Miss',titanic_df$Name),'Miss','Nothing')))
titanic_df$Title <- as.factor(titanic_df$Title)

# impute age to remove NAs
titanic_df$Age[is.na(titanic_df$Age)] <- median(titanic_df$Age, na.rm=T)

# reorder data set so target is last column
titanic_df <- titanic_df[c('PClass', 'Age',    'Sex',   'Title', 'Survived')]

# binarize all factors
charcolumns <- names(titanic_df[sapply(titanic_df, is.character)])
for (colname in charcolumns) {
     print(paste(colname,length(unique(titanic_df[,colname]))))
     for (newcol in unique(titanic_df[,colname])) {
          if (!is.na(newcol))
               titanic_df[,paste0(colname,"_",newcol)] <- ifelse(titanic_df[,colname]==newcol,1,0)
     }
     titanic_df <- titanic_df[,setdiff(names(titanic_df),colname)]
}
 
# funnel data out the mapOutputPort so it can flow into the other AzureML pipeline objects
maml.mapOutputPort('titanic_df')

We're going to use the R programming language to pull one of my goto-favs datasets, the Titanic manifest, into AzureML programmatically. We're also going to clean it up by doing some minor feature engineering (pulling the title out of the name), imputing and binarizing text files (pivoting).

source code

And construct the rest of the machine learning pipeline as follows:

final setup

Set the outcome variable to Survived:

outcome variable

And Visualize the Evaluation results:

results

Web Services - Deploying Your Results!

One more thing – one of the best parts of this tool – let’s turn this into a production pipeline where users can call a web service and predictions!

Click on SET UP WEB SERVICE at the bottom of the screen and choose Predictive Web Service [Recommended]:

web services

Run it again to push out a web service or else you’ll get a message as follows: The experiment must be run so that your edits can be validated. Then click on Deploy Web Services a second time (phew):

deploy

In the left side-bar menu, select WEB SERIVCES

web services

You should see your experiment listed there and click on it:

and copy the API key, and click on the REQUEST/RESPONSE link:

web services

At the bottom of the page, copy the R code and replace the placeholder API key with your own and run the code! That's it!

Additional Resources

Execute R Script
Quickstart tutorial for the R programming language for Azure Machine Learning

And a big thanks to Lucas for the wonderful - Azure R in the Clouds - artwork!!!

Manuel Amunategui - Follow me on Twitter: @amunategui