EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Spark Tutorial Spark Versions
Secondary Sidebar
Spark Tutorial
  • Basics
    • What is Apache Spark
    • Career in Spark
    • Spark Commands
    • How to Install Spark
    • Spark Versions
    • Apache Spark Architecture
    • Spark Tools
    • Spark Shell Commands
    • Spark Functions
    • RDD in Spark
    • Spark DataFrame
    • Spark Dataset
    • Spark Components
    • Apache Spark (Guide)
    • Spark Stages
    • Spark Streaming
    • Spark Parallelize
    • Spark Transformations
    • Spark Repartition
    • Spark Shuffle
    • Spark Parquet
    • Spark Submit
    • Spark YARN
    • SparkContext
    • Spark Cluster
    • Spark SQL Dataframe
    • Join in Spark SQL
    • What is RDD
    • Spark RDD Operations
    • Spark Broadcast
    • Spark?Executor
    • Spark flatMap
    • Spark Thrift Server
    • Spark Accumulator
    • Spark web UI
    • Spark Interview Questions
  • PySpark
    • PySpark version
    • PySpark Cheat Sheet
    • PySpark list to dataframe
    • PySpark MLlib
    • PySpark RDD
    • PySpark Write CSV
    • PySpark Orderby
    • PySpark Union DataFrame
    • PySpark apply function to column
    • PySpark Count
    • PySpark GroupBy Sum
    • PySpark AGG
    • PySpark Select Columns
    • PySpark withColumn
    • PySpark Median
    • PySpark toDF
    • PySpark partitionBy
    • PySpark join two dataframes
    • PySpark?foreach
    • PySpark when
    • PySPark Groupby
    • PySpark OrderBy Descending
    • PySpark GroupBy Count
    • PySpark Window Functions
    • PySpark Round
    • PySpark substring
    • PySpark Filter
    • PySpark Union
    • PySpark Map
    • PySpark SQL
    • PySpark Histogram
    • PySpark row
    • PySpark rename column
    • PySpark Coalesce
    • PySpark parallelize
    • PySpark read parquet
    • PySpark Join
    • PySpark Left Join
    • PySpark Alias
    • PySpark Column to List
    • PySpark structtype
    • PySpark Broadcast Join
    • PySpark Lag
    • PySpark count distinct
    • PySpark pivot
    • PySpark explode
    • PySpark Repartition
    • PySpark SQL Types
    • PySpark Logistic Regression
    • PySpark mappartitions
    • PySpark collect
    • PySpark Create DataFrame from List
    • PySpark TimeStamp
    • PySpark FlatMap
    • PySpark withColumnRenamed
    • PySpark Sort
    • PySpark to_Date
    • PySpark kmeans
    • PySpark LIKE
    • PySpark?groupby multiple columns

Related Courses

Spark Certification Course

PySpark Certification Course

Apache Storm Course

Spark Versions

By Arpit AnandArpit Anand

Spark Versions

Overview of Spark Versions

Spark version is a clustered data processing and computation framework that was started in the year 2009. And It became an open source in the year 2010. After that, it got licensed through the Apache software foundation. The Spark framework develop gradually after it got open source and has several transformation and enhancements with its releases such as , version v0.5,version v0.6,version v0.7,version v0.8,version v0.9,version v1.0,version v1.1,version v1.2,version v1.3,version v1.4,version v1.5,version v1.6,version v2.0,version v2.1,version v2.2,version v2.3 and version v2.4.Each version of the Spark release comes with new functions and features such as Spark SQL APIs, Kubernetes compatibility and AVRO data format support and optimized processing on Resilient distributed dataset.

Various Versions of Spark

Let us check the versions of spark released:

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,112 ratings)

1. Version 0.5

This was the initial version of the spark released in June 2012. It runs on Mesos 0.9 which contains usability and stability improvements. It was easy to access old jobs and logs were maintained. Many new operators were introduced like sortBykey, take a sample with the added New Hadoop API support.

The latest version available is 0.5.1.

2. Version 0.6

Just after the release of 0.5 after few months October  2012  spark made a new release of a version that brought several new features and architectural changes and performance enhancements. Standalone deploy mode was introduced that made it easy to launch the cluster without installing an external cluster manager. Persist() method was used over RDD  and more join operators were introduced. It was deployed over Maven central and now can be used over maven projects.

The latest version available is  0.6.2.

3. Version 0.7

Version 0.7 was introduced over the starting of 2013. It was a major release as python API was introduced known as Pyspark that makes it possible for the spark to use with python. Some native libraries were introduced  NumPy, SciPy. EC2 was introduced which reads s3 credential from AWS_ACCESS_KEY and AWS_SECRET_KEY that made it easy to access s3. Shuffle operations were introduced and performance improvements were introduced.

The latest version available is  0.7.3.

4. Version 0.8

This version of spark was released in Sept 2013. Monitoring UI and dashboards were introduced by a default port 4040. it contains all the information about running, completed, failed jobs. Machine Learning Library was introduced and now we can run our spark jobs over the YARN cluster. Support for YARN was introduced over this version. The Deployment of applications was easy with extending EC2 capabilities.

The latest version available is 0.8.1.

5. Version 0.9

It is a major release over the starting of 2014. It updates spark over scala2.10 with various libraries’ addition and improvements.  It includes the first version of GraphX, a powerful tool for graph processing. Spark Conf class was now a preferred way to configure advanced settings on our spark context. Windows operators were speeding up to 50%. we can use the Graph library to build graphs from RDD and then we can transform graphs and extract subgraphs.

Spark streaming’s were improved Streaming listeners were introduced. It was a major release as Spark was updated with Scala 2.10.

The latest version available is 0.9.2

6. Version 1.0

Spark 1.0 was the start of the 1.X  line. Released over 2014, it was a major release as it adds on a major new component  SPARK SQL for loading and working over structured data in SPARK. With the introduction of SPARK SQL, it was easy to query and deal with large datasets and do operations over there. Extended JAVA and PYTHON support were introduced with new lambda syntax in java bindings.

The latest version available is 1.0.2.

7. Version  1.1

It was the first minor release on the 1.X line . Since SPARK SQL was introduced this release added JDBC/ODBC servers to connect to SPARK SQL from many different applications. Support for JSON was introduced. Performance and usability Improvements was there, accumulators were introduced that were displayed in spark UI.

The latest version available is 1.1.1.

8. Version  1.2

Released over 2014,  this brings performance and usage improvements over Spark Core Engine. Spark communication manager used during bulk transfers was improved and the shuffling mechanism was upgraded.

The latest version available is 1.2.2.

9. Version  1.3

Spark 1.3 was the fourth release on the 1.X line. It comes with the introduction of DataFrame API along with the improvement of SPARK SQL API. Multiple level aggregation trees were introduced to help the help speeding up the reduce operations over spark core.SSL encryptions were introduced. Kafka docs were introduced.

The latest version available is 1.3.1.

10. Version  1.4

Released over 2015, package sparkR was introduced and expansion of MLib and Streaming was introduced. Visualization of SparkDAGs was introduced, Docker support in Mesos was introduced

The latest version available is 1.4.1.

11. Version  1.5

This version of spark basically deals with the improvement of API like RDD, DataFrame, and Datasets. Joins execution over the data frame was improved and memory management was handled.

The latest version available is 1.5.2.

12. Version  1.6

Over 2016, version 1.6 was introduced this was the last update over the spark 1.X framework.

Datasets were introduced a new spark API that helps to work with the custom objects. Reading of Non-Standard JSON files were introduced. Null Safe joins were added with the addition of working over the parquet files with the columnar data approach.

The latest version available is 1.6.3.

13. Version  2.0

This was the first release over the 2.X line. Mid 2016 let the release for version 2.0 of spark, Hive style bucketing, performance improvement and SQL improvements were added in this version. A native SQL parser was introduced. R was added with many new functionalities such as dapply, gapply, and lapply.

The latest version available is 2.0.2.

14. Version  2.1

This was the second release over 2.X family with focused improvement over spark streaming with Kafka support. The API was updates making Data type API as the stable API, json parsing was introduced, pager ranks were introduced in R.

Apart from this the performance and memory management were like random forest and faster regression features were introduced.

The latest version available is 2.1.3.

15. Version  2.2

The third release for the 2.X family came over 2017 with support over creating hive tables with data frame writer and catalog. Broadcast joins, Mapjoins were introduced for SQL Queries. Parsing of multiple json,csv files were introduced.

The latest version available is 2.2.3.

16. Version  2.3

It was the fourth release over 2.X family, Spark over Kubernetes was introduced that supports the submission of jobs managed by Kubernetes. History server was introduced, with performance improvements’ over pyspark, Hive partitioning were improved dynamic partitions were introduced.

The latest version available is 2.3.3.

17. Version  2.4

This is the latest stable release of spark application.

Adding experimental support to Scala 2.12 it gives the application owner to write their programs in Scala 2.12. Built-in Avro Data Source for better performance and usability was introduced. SQL syntax for Pivot was introduced, Coalesce and repartitions were introduced for SQL queries. This is the most stable release over the SPARK and it is widely used to create certain spark level applications all over

The latest version available is 2.4.4.

Conclusion

Here we saw from this blog the various spark versions released to date and some changes over that were performed over with these releases. With this changing data pattern and volumes, we are set up for new releases over time for better functionality of spark application.

Recommended Articles

This is a guide to Spark Versions. Here we discuss the basic concept with 17 Different Versions of Spark with detail explanation. You can also go through our other related articles to learn more –

  1. Spark Components
  2. Spark Commands
  3. Spark DataFrame
  4. Guide to Python Read CSV File
  5. Complete Guide to Spark Broadcast
Popular Course in this category
Apache Spark Training (3 Courses)
  3 Online Courses |  13+ Hours |  Verifiable Certificate of Completion |  Lifetime Access
4.5
Price

View Course

Related Courses

PySpark Tutorials (3 Courses)4.9
Apache Storm Training (1 Courses)4.8
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more