H2O & RStudio Server on Amazon Web Services (AWS), the Easy Way!
Practical walkthroughs on machine learning, data exploration and finding insight.
Resources
Steps
- Setting up an AWS Instance
- Create a VPC
- Create an EC2 Instance
- Step 1: Choose an Amazon Machine Image (AMI)
- Step 2: General Purpose Machine
- Step 3: Configure Instance Details
- Customize your Build - Advanced Details
- Step 6: Configure Security Groups
- Key Pair
- Connect
- RStudio
In this article, I will show you the easy way to install H2O and RStudio Server on Amazon Web Services (AWS) from scratch. No need of customized AMIs or third party tools - no training wheels here. And the best part is that we can do everything from the Amazon Web Service wizard, we won’t need to terminal or putty anywhere! The key is passing all additional install commands for R, RStudio Server, and Curl in the configuration window in step 3 under ‘Advanced Details’. We’ll even have it create our RStudio user account.
For those who don’t yet know, H2O
is an open-source software for machine learning and big-data analysis. It offers various models such as GLM, GBM and Random Forest, but more importantly, offers a deep learning neural network and large-scale clustering!
For a great introduction to numerous features check out: DeepLearning_Vignette.pdf
Setting up an AWS Instance
Some important points, first, AWS isn’t free! If you follow along with the instance from this walk-through, it won’t cost you more than a few cents an hour. Just don’t forget to stop or terminate you instance once you’re done!
Create a VPC
Hi there, this is Manuel Amunategui- if you're enjoying the content, find more at ViralML.com
Create an EC2 Instance
Step 1: Choose an Amazon Machine Image (AMI)
Step 2: General Purpose Machine
Step 3: Configure Instance Details
Customize your Build - Advanced Details
Check the latest RStudio Server URL
Get the latest and greatest RStudio version - check the RStudio site.
Here are the commands to enter in the Advanced Details text box (partly from AWS blog):
#!/bin/bash
# install R
yum install -y R
# install RStudio-Server
wget https://download2.rstudio.org/rstudio-server-rhel-0.99.489-x86_64.rpm
yum install -y --nogpgcheck rstudio-server-rhel-0.99.489-x86_64.rpm
yum install -y curl-devel
# add user
useradd manuel
echo manuel:testing | chpasswd
Step 6: Configure Security Groups
Here add a custom TCP rule and add port 8787. If you have a static IP, enter it in Source for added security.
Key Pair
Enter a new key pair, or choose an old one. Check the acknowledgements and click the Launch Instances
button:
Connect
Hit the launch button and once the light is green and checks successful, hit the connect button:
RStudio Server
That’s it! We’re over half-way there! Install the H2O package, initialize it and run some demos (check out the output of both demos to get familiar with some of the modeling commands)
Install H2O
install.packages("h2o")
Run Built-in Demos
Load and initialize H2O, then run a few built-in demos:
library(h2o)
localH2O = h2o.init()
demo(h2o.glm)
demo(h2o.gbm)
Enjoy!!
A special thanks to Lucas A. for the H2O & Amazonian theme!
Manuel Amunategui - Follow me on Twitter: @amunategui