Tags

, ,

Contents

  • About the Manual
  • Pre-requisites
  • Install R Base on Hadoop
  • Install R Studio on Hadoop
  • Install RHadoop packages

RHadoop is a collection of four R packages that allow users to manage and analyze data with Hadoop.

  1. plyrmr– higher level plyr-like data processing for structured data, powered by rmr
  2. rmr– functions providing Hadoop MapReduce functionality in R
  3. rhdfs– functions providing file management of the HDFS from within R
  4. rhbase– functions providing database management for the HBase distributed database from within R

This manual is direct for R and Hadoop 2.4.0 integration on Ubuntu 14.04

Pre-requisites:

 We assume, that the user would have below two running up before starting R and Hadoop integration

Ubuntu 14.04

Hadoop 2.x +

Read my blog to learn more about here on how setting-up-a-single-node-hadoop-cluster.

Pre – requisite:

Once Hadoop installation is done, make sure that all the processes are running:

Run the command jps on your terminal and the result should look similar to below screen shot:

11

R installation

Step 1: Click on the Ubuntu-software center.

1.png

Step 2:  Open Ubuntu Software Center in full screen mode, if the size of the screen is small then we cannot see the search option,Search R-base and click on the First link. Click on install

2.png

Step 3: Once installation has done open your terminal. Type the command R and your r console will be open.

 

You can perform any operation on this R console for example, to plot a graph of some variables:-

plot(seq(1,1000,2.3))

We can see the graph of this plot function below screenshot:

3.png

Step 4:

If we want to come out from R console then give the command

q()

If you want to save workspace then type y otherwise type n.

c is for continue on the same workspace.

Step 7: Now we install R-studio in ubuntu.

  • Open your browser and download r-studio. I downloaded RStudio 0.98.953 – Debian 6+/Ubuntu 10.04+ (32-bit) — this is actually a file: rstudio-0.98.953-amd32.deb

4.png

Go to download folder, right click on the download file and open file with Ubuntu Software Center and click on install.

5.png

6.png

Go on terminal and type R, you can see R console and R studio.

7.png

Install RHadoop packages

 Step1: Install thrift

sudo apt-get install libboost-dev libboost-test-dev libboost-program-options-dev libevent-dev automake libtool flex bison pkg-config g++ libssl-dev

$ cd /tmp

If the below does not work please manually download the thrift jar

$ sudo wget https://dist.apache.org/repos/dist/release/thrift/0.9.0/thrift-0.9.0.tar.gz | tar zx

$ cd thrift-0.9.0/

$ ./configure

$ make

$ sudo make install

$ thrift –help

 

Step 2: Install supporting R packges:

install.packages(c(“rJava”, “Rcpp”, “RJSONIO”, “bitops”, “digest”, “functional”, “stringr”, “plyr”, “reshape2”, “dplyr”, “R.methodsS3”, “caTools”, “Hmisc”), lib=”/usr/local/R/library”)

Step 3: Download below packages from https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads

rmr2

rhdfs

rhbase

plyrmr

In R terminal run the commands to install packages. Replace <path> to suit your downloaded file location

sudo gedit /etc/R/Renviron

Install RHadoop (rhdfs, rhbase, rmr2 and plyrmr)

Install relevant packages:

install.packages(“rhdfs_1.0.8.tar.gz”, repos=NULL, type=”source”)

install.packages(“rmr2_3.1.2.tar.gz”, repos=NULL, type=”source”)

install.packages(“plyrmr_0.3.0.tar.gz”, repos=NULL, type=”source”)

install.packages(“rhbase_1.2.1.tar.gz”, repos=NULL, type=”source”)

References

You’ll find youtube vedio and step by step instruction about installing R in Hadoop in the following link.

URL http://www.rdatamining.com/tutorials/rhadoop

Rdatamining: R on Handoop – Step by step instructions

URL: http://www.rdatamining.com/tutorials/rhadoop

Youtube: Word count map reduce program in R

URL: http://www.youtube.com/watch?v=hSrW0Iwghtw

Revolution Analytics: RHadoop packages

URL: https://github.com/RevolutionAnalytics/RHadoop/wiki

Install R-base Guide

URL: http://www.sysads.co.uk/2014/06/install-r-base-3-1-0-ubuntu-14-04/

 

In the next blog post I’ll show a sample sentiment analysis using map reduce in R using rmr package.

 

Advertisements