H2O & RStudio Server on Amazon Web Services (AWS), the Easy Way!

Practical walkthroughs on machine learning, data exploration and finding insight.

H2o and Amazon

Resources


Steps



In this article, I will show you the easy way to install H2O and RStudio Server on Amazon Web Services (AWS) from scratch. No need of customized AMIs or third party tools - no training wheels here. And the best part is that we can do everything from the Amazon Web Service wizard, we won’t need to terminal or putty anywhere! The key is passing all additional install commands for R, RStudio Server, and Curl in the configuration window in step 3 under ‘Advanced Details’. We’ll even have it create our RStudio user account.

For those who don’t yet know, H2O is an open-source software for machine learning and big-data analysis. It offers various models such as GLM, GBM and Random Forest, but more importantly, offers a deep learning neural network and large-scale clustering!

For a great introduction to numerous features check out: DeepLearning_Vignette.pdf

Setting up an AWS Instance

Some important points, first, AWS isn’t free! If you follow along with the instance from this walk-through, it won’t cost you more than a few cents an hour. Just don’t forget to stop or terminate you instance once you’re done!

Create a VPC

Create a VPC



Create an EC2 Instance

Step 1: Choose an Amazon Machine Image (AMI)


Choose an EC2 Instance



Step 2: General Purpose Machine


Choose an EC2 Instance



Step 3: Configure Instance Details


Configure Insane Details



Customize your Build - Advanced Details


Check the latest RStudio Server URL

Get the latest and greatest RStudio version - check the RStudio site.

Advanced Details



Here are the commands to enter in the Advanced Details text box (partly from AWS blog):

#!/bin/bash
# install R
yum install -y R
# install RStudio-Server
wget https://download2.rstudio.org/rstudio-server-rhel-0.99.489-x86_64.rpm
yum install -y --nogpgcheck rstudio-server-rhel-0.99.489-x86_64.rpm
yum install -y curl-devel
# add user
useradd manuel
echo manuel:testing | chpasswd



Step 6: Configure Security Groups


Here add a custom TCP rule and add port 8787. If you have a static IP, enter it in Source for added security.

Security Settings



Key Pair


Enter a new key pair, or choose an old one. Check the acknowledgements and click the Launch Instances button:

Security Settings



Connect


Hit the launch button and once the light is green and checks successful, hit the connect button:

Security Settings



RStudio Server


That’s it! We’re over half-way there! Install the H2O package, initialize it and run some demos (check out the output of both demos to get familiar with some of the modeling commands)


Install H2O

install.packages("h2o")



Run Built-in Demos


Load and initialize H2O, then run a few built-in demos:

library(h2o)
localH2O = h2o.init()

demo(h2o.glm)
demo(h2o.gbm)



Enjoy!!




A special thanks to Lucas A. for the H2O & Amazonian theme!