Data Exploration & Machine Learning, Hands-on


Recommended free walkthrough, check it out and boost your career:

Boost your data science career


H2O & RStudio Server on Amazon Web Services (AWS), the Easy Way!

Practical walkthroughs on machine learning, data exploration and finding insight.

H2o and Amazon



In this article, I will show you the easy way to install H2O and RStudio Server on Amazon Web Services (AWS) from scratch. No need of customized AMIs or third party tools - no training wheels here. And the best part is that we can do everything from the Amazon Web Service wizard, we won’t need to terminal or putty anywhere! The key is passing all additional install commands for R, RStudio Server, and Curl in the configuration window in step 3 under ‘Advanced Details’. We’ll even have it create our RStudio user account.

For those who don’t yet know, H2O is an open-source software for machine learning and big-data analysis. It offers various models such as GLM, GBM and Random Forest, but more importantly, offers a deep learning neural network and large-scale clustering!

For a great introduction to numerous features check out: DeepLearning_Vignette.pdf

Setting up an AWS Instance

Some important points, first, AWS isn’t free! If you follow along with the instance from this walk-through, it won’t cost you more than a few cents an hour. Just don’t forget to stop or terminate you instance once you’re done!

Create a VPC

Create a VPC

Hi there, this is Manuel Amunategui- if you're enjoying the content, find more at

Create an EC2 Instance

Step 1: Choose an Amazon Machine Image (AMI)

Choose an EC2 Instance

Step 2: General Purpose Machine

Choose an EC2 Instance

Step 3: Configure Instance Details

Configure Insane Details

Customize your Build - Advanced Details

Check the latest RStudio Server URL

Get the latest and greatest RStudio version - check the RStudio site.

Advanced Details

Here are the commands to enter in the Advanced Details text box (partly from AWS blog):

# install R
yum install -y R
# install RStudio-Server
yum install -y --nogpgcheck rstudio-server-rhel-0.99.489-x86_64.rpm
yum install -y curl-devel
# add user
useradd manuel
echo manuel:testing | chpasswd

Step 6: Configure Security Groups

Here add a custom TCP rule and add port 8787. If you have a static IP, enter it in Source for added security.

Security Settings

Key Pair

Enter a new key pair, or choose an old one. Check the acknowledgements and click the Launch Instances button:

Security Settings


Hit the launch button and once the light is green and checks successful, hit the connect button:

Security Settings

RStudio Server

That’s it! We’re over half-way there! Install the H2O package, initialize it and run some demos (check out the output of both demos to get familiar with some of the modeling commands)

Install H2O


Run Built-in Demos

Load and initialize H2O, then run a few built-in demos:

localH2O = h2o.init()



A special thanks to Lucas A. for the H2O & Amazonian theme!

Manuel Amunategui - Follow me on Twitter: @amunategui