Spark 2 – What’s new?

I’ve prepared a lecture titled “What’s new in Spark 2?”. The slides can be found here.

Spark 2.x line started Jul 2016, and had over 1,000 JIRA issues associated with it (version 2.0.0). Version 1.6.0, which is the last 1.x line had just over 600 issues, for comparison.

Most changes were made to the Spark SQL library, as Databricks seem to be building on it to be the future of Spark.

SparkSession is introduced as a new entry point, and will replace SQLContext and HiveContext (kept for backwards compatibility).

SparkSession creates DataSet (DataFrame is now just an alias for DataSet of Rows) which are now capable for streaming. This is one of the coolest, yet tagged as ‘Preview’, features that are new, and should be interesting in the future.

Standard

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s