Introduction to Cassandra Versions
Apache Cassandra Versions refers to its various changes from the time it was conceived at Facebook to today where it’s used by some of the biggest companies and organizations like CERN and Apple. In this article we aim to understand the evolution of Apache Casandra through its various iterations or versions over the years. We’ll see how Casandra came from being an in house project at Facebook to becoming the global solution for No SQL Database management system used by Fortune 500 companies.
Apache Cassandra was developed by Avinash Lakshman and Prashant Malik at Facebook to power its search inbox feature. Facebook then released Cassandra as an open-source project on Google code and in 2008 and in 2009 it became an Apache Incubator project. In 2010 it launched as a top-level project becoming a part of the Apache Software Foundation.
Top 11 Versions of Cassandra
Following are the Cassandra versions are given below:
1. Version 0.6 (April ‘10)
This was the launch of Apache Cassandra after being in development phase and graduating from Apache Incubator and becoming part of the Apache Software Foundation meaning receiving full support from Apache Foundation. It incorporated support for Hadoop, caching, and improved performance. The Hadoop support allowed Cassandra to use MapReduce framework to run various analyses on data stored within Cassandra. This change simply changed the architecture for good and boosted overall performance and use cases.
2. Version 0.7 (Jan ’11)
In this version, secondary Indexes and online schema changes were important changes and additions made to Cassandra. Indexes on column values are called Secondary Indexes and it allows querying by value and can be built in the background without blocking reads and writes. Before this update keyspaces and column families had to be described in the configuration file of Cassandra. Adding, removing or updating meant rolling a cluster update. This was not required in 0.7 as new methods were added to the API and schemas could be changed without restarting the cluster.
3. Version 0.8 (June ‘11)
This was a major update where CQL for Cassandra Query Language was introduced along with other updates like self-tuning memtables and support for zero-downtime upgrades. In this update, Cassandra offered CQL, which closely modeled after SQL where data was stored in tables containing rows and columns.
4. Version 1.0 (Oct ‘11)
In this version, there was a great improvement in the speed of reads and writes were compared to previous versions there was an increase of 40% in writes and a spectacular 400% reads performance improvement. Along with major performance improvements, there were other changes like added integrated compression and Level compaction.
5. Version 1.1 (April ‘12)
One of the major changes to Cassandra in this version was row-level isolation. The log structure nature of Cassandra storage engine makes row-level isolation easier ensuring that a row mutation is applied to current memtable in isolation of other reads and writes and is enough to complete isolation. This is called ‘RowMutation’ to memtables in isolation and was the major feature of this update apart from support for SSD/Spinning disk deployments.
6. Version 1.2 (Jan ‘13)
This version added clustering across Virtual Nodes which improve the granularity of capacity increases and dramatically improved
repair and rebuild time in larger clusters, CQL3 improvement, notably the addition of collection types, Queryable system information and CQL Native protocol and, Atomic batches addressing the possibility of mid batch coordinator failure amongst other performance improvements.
7. Version 2.0 (September ‘13)
Cassandra 2.0 came with some of the most evolutionary upgrades like making it easier for developers to migrate from a relational database and become more productive. Some of the other improvements with 2.0 were,
- Lightweight transactions that were based on Paxos consensus protocol ensured an operation linearization which is similar to the serializable isolation level offered by relational databases that prevent conflicts during concurrent requests.
- Triggers which enable pushing performance-critical code close to the data it deals with and also simplifying integration with an event-driven framework like Apache Storm.
8. Version 2.1, 2.2 (September ‘14)
This was an incremental update where apart from Performance upgrades some other features included improved Hadoop Support, Post compaction read performance, incremental node repair, improved row cache.
Also, some new features were added like User-defined types that facilitates handling multiple fields of related information in a table, Collection indexes where you can index collections, and query the database to find a collection containing a particular value and better implementation of counters that makes them safer, simpler and faster. In version 2.2 apart from incremental updates, there was major JSON support was added to CQL and Role-based Access control.
9. Version 3.0 (November ‘15)
This version added a materialized view which was a great advantage for developers and users alike looking to lower the burden of denormalization. Using materialized views one can create multiple views on the same tale each having different primary key combinations. Also, the storage engine was refactored and Native protocol V1 and V2 were dropped in Cassandra 3.0
10. Version 3.1 – 3.10 (2016-17)
There were monthly releases where updates were done through tick-tock like release model, with even-numbered releases providing both new features and bug fixes while odd-numbered releases would only have bug fixes.
11. Version 3.11 (June ‘17)
This is the latest stable release of Cassandra till date. It brought along with it some minor bug fixes and overall performance improvements. Version 4 of Apache Cassandra is expected to be released anytime this year with some major incremental changes.
Cassandra has come a long way and countless iterations over the years and is used today y organizations like CERN, Apple, Netflix, and various governmental institutions. It is imperative for anyone interested in the field of Big data to be aware of NoSQL databases like Apache Cassandra along with other technologies like Hadoop and Spark which are used together to have a holistic understanding of Data Science in general and Big data in particular.
This is a guide to Cassandra Versions. Here we also discuss the Introduction and top 11 versions of Cassandra which includes, version 0.6,0.7,0.8, etc. You may also have a look at the following articles to learn more –