Data Exploration & Machine Learning, Hands-on


"

Recommended free walkthrough, check it out and boost your career:


Boost your data science career

"







Brief Guide On Running RStudio Server On Amazon Web Services

Practical walkthroughs on machine learning, data exploration and finding insight.

Resources



There are a lot of great resources on the web, but I didn’t find one that covered my needs from end to end, and figured others could benefit from a walkthrough. Here, I’ll show how to select, install and run RStudio Server, customize security settings to use the RStudio’s web interface, and upload and download data between your local machine and the server.

If you like Kaggle competitions, like I do, then this is a great way of quickly adding all sorts of computing configurations to satiate your needs. Our first stop is at Louis Aslet’s web page. Louis curates a series of Amazon Machine Images (referred as AMIs):



These are pre-configured images that will install RStudio and common R packages onto a computing instance. This is a huge time and money saver as it automatically installs a whole slew of software under two minutes - and when you’re charged by the minute, every minute counts.

Louis also has a video, albeit short, on how to setup RStudio and lots of additional resources. So explore them if you have unanswered questions. We’re interested in the upper right-hand box where you need to select the AMI closest to your location and click on it. This will take you to the Amazon Web Services page. If you do not have an AWS account, it will prompt you to set one up:



Otherwise it will take you to Step 2. This is the fun part. Its like going to the store and picking up a brand new computer. Here you get to choose how much computing muscle you want. The AMI image you selected earlier will get applied to whatever setup you choose. You can go for more GPU, memory, storage, etc. Unfortunately, throwing more memory at a problem is not a guarantee to make it go away - and I’m talking from personal experience here.



I recommend starting small as it is easy to upgrade an existing instance to something bigger.


Hi there, this is Manuel Amunategui- if you're enjoying the content, find more at ViralML.com



Security

You need to have two open ports and a key-pair to communicate with your instance. Port 22 should be opened by default and you’ll need to add port 80.

Port 22 is used to connect a command line terminal tool using SSH. I will not be showing that today. Instead, we’ll be using port 80 which allows access to the web interface of RStudio. So add another rule, choose HTTP, enter the value 80, and leave the rest as is:



After you hit Launch, a key-pair pop-up box will appear. This is what authenticates your computer’s identity and allows you to communicate securely to your AWS instance. If this is your first time using EC2 you’ll want to create and download a new key pair:



View Instance

After launching your instance, once the instance state goes from initializing to running, the public DNS string is the official link to your instance’s RStudio web interface:

  • Remember where you save this key-pair as you cannot communicate to your instance without it.
  • If security is a big concern, there are plenty of additional settings and customizations available, please refer to the online RStudio server and Amazon EC2 help files regarding your options.
  • Once your instance is running (green light), click on it, copy the Public DNS URL and paste it in your browser:



    Using RStudio Server

    You will be prompted for your credentials. By default, the initial account and password for these AMIs is rstudio, rstudio, all lower case:



    Once in RStudio, the first thing you need to do is run the loaded script and change the password (minimum length required is 8 characters). Replace the mypassword with your new password and hit the run script button. Then log out and back in with the new password:



    Uploading and Downloading Files

    The last thing I want to cover is how to upload and download files to and from your EC2 instance.

    Uploading Files (i.e. copying files from your local machine to your EC2 Instance):

    Note: you can upload several files or even an entire folder at once, you just need to compress everything into a zip file and upload it (when RStudio receives an uploaded zip file it will automatically uncompresses it).



    Downloading Files (i.e. copying files from your EC2 instance to your local machine):



    Important Don’t forget to shut down the server or terminate it completely - otherwise the meter will keep running and you will keep being charged!

    Additional Resources (PDFs)

    RStudio Server Administrator’s Guide

    Using RStudio on Amazon EC2 under the Free Usage Tier

    Manuel Amunategui - Follow me on Twitter: @amunategui