Practical walkthroughs on machine learning, data exploration and finding insight.
In this article, I will show you the easy way to install H2O and RStudio Server on Amazon Web Services (AWS) from scratch. No need of customized AMIs or third party tools - no training wheels here. And the best part is that we can do everything from the Amazon Web Service wizard, we won’t need to terminal or putty anywhere! The key is passing all additional install commands for R, RStudio Server, and Curl in the configuration window in step 3 under ‘Advanced Details’. We’ll even have it create our RStudio user account.
For those who don’t yet know,
H2O is an open-source software for machine learning and big-data analysis. It offers various models such as GLM, GBM and Random Forest, but more importantly, offers a deep learning neural network and large-scale clustering!
For a great introduction to numerous features check out: DeepLearning_Vignette.pdf
Some important points, first, AWS isn’t free! If you follow along with the instance from this walk-through, it won’t cost you more than a few cents an hour. Just don’t forget to stop or terminate you instance once you’re done!
Check the latest RStudio Server URL
Get the latest and greatest RStudio version - check the RStudio site.
Here are the commands to enter in the Advanced Details text box (partly from AWS blog):
#!/bin/bash # install R yum install -y R # install RStudio-Server wget https://download2.rstudio.org/rstudio-server-rhel-0.99.489-x86_64.rpm yum install -y --nogpgcheck rstudio-server-rhel-0.99.489-x86_64.rpm yum install -y curl-devel # add user useradd manuel echo manuel:testing | chpasswd
Here add a custom TCP rule and add port 8787. If you have a static IP, enter it in Source for added security.
Enter a new key pair, or choose an old one. Check the acknowledgements and click the
Launch Instances button:
Hit the launch button and once the light is green and checks successful, hit the connect button:
That’s it! We’re over half-way there! Install the H2O package, initialize it and run some demos (check out the output of both demos to get familiar with some of the modeling commands)
Load and initialize H2O, then run a few built-in demos:
library(h2o) localH2O = h2o.init() demo(h2o.glm) demo(h2o.gbm)
A special thanks to Lucas A. for the H2O & Amazonian theme!