Udemy Classes

Data Exploration & Machine Learning, Hands-on

Practical walkthroughs on machine learning, data exploration and finding insight.


For more tech and data science insights, find me on Medium.com
Give it a clap or many!

Executive Time Management — Don’t Suffocate the Creative Process Who’s stuck in meetings all day long and chronically strapped for time? Yes, the executive. If you know one, lack of time to think and…
Jun 13, 2018

Staying Relevant in Tech, Telltale Signs That You’re on Track You know the Millennium Falcon and its hyperdrive? Where stars in the sky stretch out and passengers sink in their seats with constipated…
Jun 8, 2018

Coding With Boxing Gloves & The Fight For Your Customer’s Happiness About 10 years ago, while working on Wall Street (literally, our office leaned against the American Stock Exchange), I bumped into a…
Jun 2, 2018

The Workflow of the Highly-Trained Professional, The Last Fortress Against A.I. Let me reassure those who fear the world of the human professional is about to end. I spent a few years as a data scientist working for a…
May 29, 2018

Find Your Next Programming Language By Measuring “The Knowledge Gap” on StackOverflow.com Ten years ago, I was in the mood for a new programming language. The statement may sound casual and easy, but it was anything but that. I…
May 24, 2018

My #1 Piece of Advice for Aspiring Data Scientists Whenever I am asked how to break into this field, I reply, ‘push it out into the public domain on a regular basis’.
May 21, 2018

Understanding Your Stance On A.I. Ethics After The Fact? That’s OK It’s certainly a lot better than not thinking about it at all. As a software producer for many years now, a trick I have relied on time and…
May 18, 2018

GPUs on Google Cloud - the Fast Way & the Slow Way

rescue bot

Here are the notes on the two methods presented in the video to get GPUs up-and-running for Tensorflow on Google Cloud. A quick one using pre-built Jetware instances and another from scratch using a VM.

Pairing Reinforcement Learning and Machine Learning, an Enhanced Emergency Response Scenario

rescue bot

Extending Reinforcement Learning/Q-Learning with ML and IoT to benefit from the best aspects of each modeling technique.

Chatbot Conversations From Customer Service Transcripts


Chatbots are all the rage these days and we get a lot of requests for them at SpringML. They not only have that “AI” chic, they also offer faster customer service at a much cheaper cost - a huge win-win. Though chatbot technology is mature and available today, see Dialogflow from Google as an example of how easy it is to implement, building a good one is no trivial task. The last thing you want to do is anger customers that are reaching out for help.

Serverless Hosting On Microsoft Azure - A Simple Flask Example

Serverless Hosting on Azure

Let’s see how we can run the basic "Hello World" Flask application on Microsoft Azure’s serverless Web Apps. We'll keep things as simple as possible to get a feel of the minimum steps needed to get up-and-running on the web.

Google Video Intelligence, TensorFlow And Inception V3 - Recognizing Not-So-Famous-People

video facial recognition

This is the second part to the previous blog entry Google's Video Intelligence and Vision APIs - Automatically Recognize Actors and Download their Biographies in Real Time. The celebrity pipeline could chew through a celebrity video faster than we can say protagonist! Can we divert some of that speed and instight towards something other than finding celebrities? What if you we wanted to identify regular people? Or any other entity not currently offered with Google Video Intelligence? Yes, we can!

Rapid Prototyping on Google App Engine - Trip Planner with Google Maps and Yelp

Portland to Boise one Florist at a time

Easily Extend your Python ML Models into Interactive Web Applications - Let's build a web application to map out trips and plot specific businesses along the route every fifty miles or so using Yelp and Google App Engine.

Yelp v3 and a Romantic Trip Across the USA, One Florist at a Time

art album

Just in time for Valentine’s Day, if you happen to be planning a trip across the United States and want to offer your companion a rose at every degree of latitude traveled, then this walk-through is for you!

We'll use the Yelp v3 API to cross the United States from San Francisco, CA to New York City, NY, and be 60 miles from a florist at all times. This is an updated take on the previous walk-though I did 3 years ago. This one uses the new Yelp v3 API and is written in Python. The second part will extend this into an interactive web-app on Google Cloud.

Show it to the World! Build a Free Art Portfolio Website on GitHub.io in 20 Minutes!

art album

GitHub.io is a great option if you want to host a static website for free. With basic tools such as HTML, JavaScript, Jekyll and Bootstrap you can make a professional-looking site in no time.

Let's build a site that displays pictures or artwork using thumbnails with the ability to click on them to see larger versions of the images. This will make a perfect portfolio website for that budding artist in your family (which is where I got the idea in the first place with my artistically-inclined son).

Google Video Intelligence and Vision APIs - Automatically Recognize Actors and Download their Biographies in Real Time

recognizing actors

For around 200 lines of code you can process a popular movie-clip down to the actor's face and biography in a fully automated pipeline! This may seem like a lot of code, but just five years ago it would have required tens of thousands of lines and a headache.

At SpringML, we’ve built many solutions using the versatile Google Cloud suite of APIs but some of the funnest projects have been working with the Google Video Intelligence and Vision APIs in particular. Leveraging these two powerful APIs is like having an army of convolutional neural-network PhDs at your beck and call.

Life Coefficients - Modeling Life Expectancy and Prototyping it on the Web with Flask and PythonAnywhere

life expectancy

I’m going to walk through a simple linear regression model on estimating life expectancy in Python and build it into an interactive web application using Flask and PythonAnywhere. Check out a version that I built at TimeLeftToLive.com. I see this topic as an important awareness tool but I apologize in advance if I’m going to depress anybody. It’s up there with those interactive banking tools reminding you how much money you don’t have to properly retire. This is a surprisingly easy model to build using solid data curated by top statisticians. Two top sources are the World Health Organization (WHO) and the Central Intelligence Agency (CIA).

Convolutional Neural Networks And Unconventional Data - Predicting The Stock Market Using Images

creating thousands of synthetic charts

This post is about taking numerical data, transforming it into images and modeling it with convolutional neural networks. In other words, we are creating synthetic image-representation of quantitative data, and seeing if a CNN brings any additional modeling understanding, accuracy, a boost via ensembling, etc. CNN’s are hyper-contextually aware - they know what’s going on above, below, to the right and to the left, even what is going on in front and behind if you use image depth.

The Fallacy of the Data Scientist's Venn Diagram

original data science venn diagram

Well, fallacy may be a strong word, how about incomplete? I’m talking about the Venn diagram that depicts the skills needed to be a data scientist. Today, after a few years as a data scientist, this is how I would draw it...

Reinforcement Learning - A Simple Python Example and a Step Closer to AI with Assisted Q-Learning

looking for honey

Machine learning used to be either supervised or unsupervised, but today it can be reinforcement learning as well! Here we’ll start with a very simple Python example of Q-learning to find the shortest path between two points. Then we'll explore a more conceptual example to think about ways to intelligently extend Q-Learning to today's problems.

Simple Heuristics - Graphviz and Decision Trees to Quickly Find Patterns in your Data

iris data splits

Before breaking out the big algos on a new dataset, it is a good idea to first explore the simple, intuitive patterns (i.e. heuristics). This will pay off in droves. It not only exposes you to your data, it makes you understand it and gives you that critical 'business knowledge'. People you work with will ask you general questions about the data, and this is how you can get to it. In this post will explore how to find the important values that explain a particular target outcome. We'll use sklearn's DecisionTreeClassifier and graphviz for exporting and visualizing resulting trees.

Office Automation Part 3 - Classifying Enron Emails with Google's Tensorflow Deep Neural Network Classifier

Word Vectors

This is the last video/post in the Enron and word2vec series - thanks for hanging in and hopefully you'll find this fun. This is where we bring it all together and come up with a production-grade, classification solution to routing emails automatically.

Office Automation Part 2 - Using Pre-Trained Word-Embedded Vectors to Categorize the Enron Email Dataset

Word Vectors

Second video of 3, here we use pre-trained word-embedded vectors to find clear logical and thematic clusters in the Enron email dataset.

Office Automation Part 1 - Sorting Departmental Emails with Tensorflow and Word-Embedded Vectors

Word Vectors

First part out of 3, word2vec on Tensorflow and modeling the Enron Email Dataset. We'll clean up the emails, model it with word2vec skip-gram and cluster it to discover themes.

Easy Market Profile in Python: Grasp Price Action Quickly

Market Profile

Let’s build a market profile chart using Python in about 30 lines of code. This is a bare version of J.P Steidlmayer’s charting system, but should give you a good idea of market distribution within a particular time frame and where the market spent most of its time.

What-if Roadmap - Assessing Live Opportunities and their Paths to Success or Failure


In the pursuit of actionable insights, we can use historical closed opportunities that are similar to open ones and analyze what made one win and another lose. This doesn’t necessarily have to be sales data, it can be anything that hasn’t reached some outcome or end point - where there is an unknown outcome and a desire to coax it in a particular direction.

Where Are Your Customers Coming From And Where Are They Going - Reporting On Complex Customer Behavior In Plain English With C5.0

C50 Globe

C5.0 outputs a set of complex, non-linear rules describing what features and at what level provide the most lift to a model. With minimal prep-work, see how we can easily automate the reporting of results using a structure similar to the spoken language, something that anybody can understand - no statistics degree required!

Databricks, SparkR and Distributed Naive Bayes Modeling

r and azure

One of the recent additions to SparkR is the Naive Bayes classification model. It's simple, fast and accurate - perfect for working with large data sets in distributed environments - yep, perfect for Spark! Here is a look at the model in action with some of its limitations and workarounds.

R and Azure ML - Your One-Stop Modeling Pipeline in The Cloud!

r and azure

At the risk of being accused of only using Amazon Web Services, here is a look at modeling using Microsoft Azure Machine Learning Studio along with the R programming language. It is chock-full of data munging, modeling, and delivery functions!

Get Your "all-else-held-equal" Odds-Ratio Story for Non-Linear Models!

pseudo coefficients

On one hand we have tree-based classifiers and deep-belief networks and on the other, linear regression models. What the latter lacks in terms of coolness and precision certainly makes up in transparency and actionability. People just love their coefficients and odd ratios. Here is an approach to extract odds out of tree-based classifiers so you too can say 'all else held equal, a one unit change in x, will result in a pseudo-coefficient change of something in y'. The bonus here is that we capture non-linear movements - this can yield a lot of intelligence out of your variables!

Predict Stock-Market Behavior using Markov Chains and R


We apply Markov Chains to map and understand stock-market behavior using the R programming language. By using 2 transition matrices instead of one, we are able to weigh the probability of a binary outcome.

Big Data Surveillance: Use EC2, PostgreSQL and Python to Download all Hacker News Data!

Big Data Surveillance

We'll first look at the Algolia API and Max Woolf's scripts to download all Hacker News data using EC2 and PostgreSQL, then we'll look at the Firebase/Hacker News API web service to pull specific content by ID.

The Peter Norvig Magic Spell Checker in R

Peter Norvig Magic Spell Checker

Peter Norvig, Director of Research at Google, offers a clever way for any of us to create a good spell checker with nothing more than a few lines of code and some text data.

Actionable Insights: Getting Variable Importance at the Prediction Level in R

Actionable Insights

Here is an easy way to get the top and bottom features contributing to a prediction. This affords a level of transparency to the report reader in understanding why the model chose a particular probability for an observation.

Survival Ensembles: Survival Plus Classification for Improved Time-Based Predictions in R

Survival Ensembles

In this post we'll look at extracting AUC scores from survival models, blending and ensembling random forest survival with gradient boosting classification models, and measuring improvements on time-based predictions.

Anomaly Detection: Increasing Classification Accuracy with H2O's Autoencoder and R

Anomaly Detection

Use H2O's anomaly detection with R to separate data into easy and hard to model subsets and gain predictive insight.

H2O & RStudio Server on Amazon Web Services (AWS), the Easy Way!

H2O on AWS

See how easy it is to install H2O and RStudio Server on Amazon Web Services (AWS) from scratch. No need of customized AMIs or third party tools - no training wheels here! And the best part is that we can do everything from the Amazon Web Service wizard, no need to tunnel or putty anywhere!

Analyze Classic Works of Literature from Around the World with Project Gutenberg and R

Project Gutenberg

Project Gutenberg offers and easy way to download over 50,000 classic works of literature from around the world in digital format using the R language.

Speak Like a Doctor - Use Natural Language Processing to Predict Medical Words in R


Using R, natural language processing (NLP), a medical corpus, and a Shiny application, we build an interactive tool to predict what a doctor will say next.

Supercharge R with Spark: Getting Apache's SparkR Up and Running on Amazon Web Services (AWS)


See how easy it is to set up a few SparkR clusters and to control them from RStudio. In this first installment, we'll set up multiple clusters on Amazon's AWS EC2 and control them via RStudio.

R and Excel: Making Your Data Dumps Pretty with XLConnect

R and Excel

When it comes to exporting data, one has many formats to choose from. But if you're looking for something more sophisticated than a comma-delimited file but aren't ready for an off-the-shelf content-management system, then Excel may be what you need in presenting content in a more digestible format.

Going from an Idea to a Pitch: Hosting your Python Application using Flask and Amazon Web Services (AWS)

Flask and AWS

This walk-through is about demonstrating how easy it is to transform an idea into a web application. This is for those who want to quickly pitch their application to the world without getting bogged down by technical details. This is for the weekend warrior. If the application is a success, people with real skills will be brought in to do the job right, in the meantime we want it fast, cheap and easy. We'll use Python, Flask, and EC2 Amazon Web Services to migrate an program into a web application.

Getting PubMed Medical Text with R and Package {RISmed}

PubMed Medical Text

PubMed is a great source of medical literature. If you are working on a Natural Language Processing (NLP) project and need 100's or 1000's of topic-based medical text, the RISmed package can simplify and automate that process.

Find Variable Importance for any Model - Prediction Shuffling with R

Variable Importance

You model and predict once to get a benchmark score, then predict hundreds of times for each variable while randomizing it each time. If the variable being randomized hurts the model's benchmark score, then its an important variable. If nothing changes then its a useless variable.

Bagging / Bootstrap Aggregation with R

Bagging & Bootstrap

Bagging is the not-so-secret edge of the competitive modeler. By sampling and modeling a training data set hundreds of times and averaging its predictions, you may just get that accuracy boost that puts you above the fray.

Feature Hashing (a.k.a. The Hashing Trick) With R

Feature Hashing

Feature hashing is a clever way of modeling data sets containing large amounts of factor and character data. It uses less memory and requires little pre-processing. In this walkthrough, we model a large healthcare data set by first using dummy variables and then feature hashing.

Yelp, httr and a Romantic Trip Across the United States, One Florist at a Time

Yelp & R

The title says it all, we are going to use Yelp to cross the United States from San Francisco, CA to New York City, NY, and be 60 miles from a florist at all times.

Quantifying the Spread: Measuring Strength and Direction of Predictors with the Summary Function

Measuring Predictiors

Use the Summary() function to quickly and intuitively measure predictors. By splitting the data into two sets, one for each outcome, and summarizing them individually, we can plot and measure behaviors towards the outcome variable. Simple, easy, and fast!

Downloading Data from Google Trends And Analyzing It With R

Google Trends

In this walkthrough, I introduce Google Trends by querying it directly through the web, downloading a comma-delimited file of the results, and analyzing it in R.

Using String Distance {stringdist} To Handle Large Text Factors, Cluster Them Into Supersets


{stringdist} can help make sense of large, text-based factor variables by clustering them into supersets. This approach preserves some of the content's substance without having to resort to full-on, natural language processing.

SMOTE - Supersampling Rare Events in R


Brief introduction of the SMOTE package and over-sampling imbalanced data sets. SMOTE uses bootstrapping and k-nearest neighbor to synthetically create additional observations.

Let's Get Rich! See how {quantmod} And R Can Enrich Your Knowledge Of The Financial Markets!


See how easy it is display great looking current stock charts in 2 lines of code and then stock market data and use it all to build a complex market model.

How To Work With Files Too Large For A Computer’s RAM? Using R To Process Large Data In Chunks

Dealing with large data sets

Using the function read.table(), we break file into chunks in order to process them. This allows processing files of any size beyond what the machine's RAM can handle.

Predicting Multiple Discrete Values with Multinomials, Neural Networks and the {nnet} Package

Predicting Multiple Discrete Values

Using R and the multinom function from the nnet package, we can easily predict discrete values (factors) of more than 2 levels. We also use Repeated Cross Validation to get an accurate model score and to understand the importance of allowing the model to converge (reaching global minima).

Modeling 101 - Predicting Binary Outcomes with R, gbm, glmnet, and {caret}

Caret Variable Importance

This walkthrough shows how to easily model binary outcomes using caret models, how to evaluate the predictions, and how to display variable importance.

Reducing High Dimensional Data with Principle Component Analysis (PCA) and prcomp

Principle Component Analysis

In this R walkthrough, we'll see how PCA can reduce a 1000+ variable data set into 10 variables and barely lose accuracy! This is incredible, and everytime I play around with this, I still get amazed!

The Sparse Matrix and {glmnet}

sparse glmnet

Walkthrough of sparse matrices in R and basic use of the glmnet package. This will show how to create them, find the best probabilities through the glmnet model, and how a sparse matrix deals with categorical values.

Brief Walkthrough Of The dummyVars Function From {caret}

Caret dummyVars Function

The dummyVars function streamlines the creation of dummy variables by quickly hunting down character and factor variables and transforming them into binaries, with or without full rank.

Ensemble Feature Selection On Steroids: {fscaret} Package


Give fscaret an ensemble of models and some data, and it will have the ensemble vote on the importance of each feature to find the strongest ones. In this walkthrough, we use R and the classic Titanic data set to predict survivorship.

Mapping The United States Census With {ggmap}

ggmap example

ggmap enables you to easily map data anywhere around the world as long as you give it geographical coordinates. Here we overlay census data over a Google map of the United States.

Using Correlations To Understand Your Data


A great way to explore new data in R is to use a pairwise correlation matrix. This will pair every combination of your variables and measure the correlation between them.

Brief Guide On Running RStudio Server On Amazon Web Services

RStudio Server on AWS

Steps you through installing pre-configured AMIs with RStudio Server on AWS EC2, interacting with the web interface, and uploading and downloading files to/from your instance.