Here at LTS Computing LLC, we’ve been using Apache Spark mostly for ETL work and we like it a lot. The 1.4 release has made it even more attractive.
Here are some of the key new features:
- SparkR – an R binding for Spark. SparkR gives R users access to Spark’s scale-out parallel run-time along with all of Spark’s input and output formats.
- Mathematical functions in DataFrames
- Window functions in Spark SQL and DataFrames
- Rollup and cube functions
- Summary and descriptive statistics
The full release 1.4 notes are here.