I’ve prepared a lecture titled “What’s new in Spark 2?”. The slides can be found here.
Spark 2.x line started Jul 2016, and had over 1,000 JIRA issues associated with it (version 2.0.0). Version 1.6.0, which is the last 1.x line had just over 600 issues, for comparison.
Most changes were made to the Spark SQL library, as Databricks seem to be building on it to be the future of Spark.
SparkSession is introduced as a new entry point, and will replace SQLContext and HiveContext (kept for backwards compatibility).
SparkSession creates DataSet (DataFrame is now just an alias for DataSet of Rows) which are now capable for streaming. This is one of the coolest, yet tagged as ‘Preview’, features that are new, and should be interesting in the future.