Practical walkthroughs on machine learning, data exploration and finding insight.
There are a lot of great resources on the web, but I didn’t find one that covered my needs from end to end, and figured others could benefit from a walkthrough. Here, I’ll show how to select, install and run RStudio Server, customize security settings to use the RStudio’s web interface, and upload and download data between your local machine and the server.
If you like Kaggle competitions, like I do, then this is a great way of quickly adding all sorts of computing configurations to satiate your needs. Our first stop is at Louis Aslet’s web page. Louis curates a series of Amazon Machine Images (referred as AMIs):
These are pre-configured images that will install RStudio and common R packages onto a computing instance. This is a huge time and money saver as it automatically installs a whole slew of software under two minutes - and when you’re charged by the minute, every minute counts.
Louis also has a video, albeit short, on how to setup RStudio and lots of additional resources. So explore them if you have unanswered questions. We’re interested in the upper right-hand box where you need to select the AMI closest to your location and click on it. This will take you to the Amazon Web Services page. If you do not have an AWS account, it will prompt you to set one up:
Otherwise it will take you to Step 2. This is the fun part. Its like going to the store and picking up a brand new computer. Here you get to choose how much computing muscle you want. The AMI image you selected earlier will get applied to whatever setup you choose. You can go for more GPU, memory, storage, etc. Unfortunately, throwing more memory at a problem is not a guarantee to make it go away - and I’m talking from personal experience here.
I recommend starting small as it is easy to upgrade an existing instance to something bigger.
You need to have two open ports and a key-pair to communicate with your instance.
Port 22 should be opened by default and you’ll need to add
Port 22 is used to connect a command line terminal tool using SSH. I will not be showing that today. Instead, we’ll be using
port 80 which allows access to the web interface of RStudio. So add another rule, choose
HTTP, enter the value 80, and leave the rest as is:
After you hit Launch, a key-pair pop-up box will appear. This is what authenticates your computer’s identity and allows you to communicate securely to your AWS instance. If this is your first time using EC2 you’ll want to create and download a new key pair:
After launching your instance, once the instance state goes from
running, the public DNS string is the official link to your instance’s RStudio web interface:
Once your instance is running (green light), click on it, copy the Public DNS URL and paste it in your browser:
Using RStudio Server
You will be prompted for your credentials. By default, the initial account and password for these AMIs is
rstudio, all lower case:
Once in RStudio, the first thing you need to do is run the loaded script and change the password (minimum length required is 8 characters). Replace the mypassword with your new password and hit the
run script button. Then log out and back in with the new password:
Uploading and Downloading Files
The last thing I want to cover is how to upload and download files to and from your EC2 instance.
Uploading Files (i.e. copying files from your local machine to your EC2 Instance):
Note: you can upload several files or even an entire folder at once, you just need to compress everything into a zip file and upload it (when RStudio receives an uploaded zip file it will automatically uncompresses it).
Downloading Files (i.e. copying files from your EC2 instance to your local machine):
Important Don’t forget to shut down the server or terminate it completely - otherwise the meter will keep running and you will keep being charged!
Additional Resources (PDFs)