I know there are many R users who like to test out SparkR without all the configuration hassle. Just these six lines and you can start SparkR from both RStudio and command line.
One line for Spark and SparkR
Apache Spark is a fast and general-purpose cluster computing system
SparkR is an R package that provides a light-weight frontend to use Apache Spark from R
Six lines to start SparkR
The first three lines should be called in your command line.
brew update # If you don't have homebrew, get it from here (http://brew.sh/)
brew install hadoop # Install Hadoop
brew install apache-spark # Install Spark
You can already start SparkR shell by typing this in your command line;
If you like to call it from RStudio, execute the rest in R
spark_path <- strsplit(system("brew info apache-spark",intern=T)[4],' ')[[1]][1] # Get your spark path
.libPaths(c(file.path(spark_path,"libexec", "R", "lib"), .libPaths())) # Navigate to SparkR folder
library(SparkR) # Load the library
That’s all.
Now this should run in your RStudio
sc <- sparkR.init()
sqlContext <- sparkRSQL.init(sc)
df <- createDataFrame(sqlContext, iris)
# Sepal_Length Sepal_Width Petal_Length Petal_Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
The full codes are available from here.
Six lines to install and start SparkR on Mac OS X Yosemite was originally published by Kirill Pomogajko at Opiate for the masses on September 21, 2015.