Spark 2 – What’s new?

I’ve prepared a lecture titled “What’s new in Spark 2?”. The slides can be found here.

Spark 2.x line started Jul 2016, and had over 1,000 JIRA issues associated with it (version 2.0.0). Version 1.6.0, which is the last 1.x line had just over 600 issues, for comparison.

Most changes were made to the Spark SQL library, as Databricks seem to be building on it to be the future of Spark.

SparkSession is introduced as a new entry point, and will replace SQLContext and HiveContext (kept for backwards compatibility).

SparkSession creates DataSet (DataFrame is now just an alias for DataSet of Rows) which are now capable for streaming. This is one of the coolest, yet tagged as ‘Preview’, features that are new, and should be interesting in the future.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s