What is Cassandra?
Cassandra is a NoSQL database which is peer to peer distributed database. It runs on a cluster that has homogenous nodes. It is made in such a way that it can handle large volumes of data. With handling this data it should also be capable of providing a high capability. Cassandra provides high throughout when it comes to read and write operations. The architecture of Cassandra cluster does not have any masters, slaves or any specific leaders. By using this way it makes sure there is no single point of failure. Let us have a look at the architecture in detail.
The Cassandra Architecture mainly consists of Node, Cluster and Data Center. In addition to these, there are other components as well. Cassandra is a row stored database. It enables authorized users to connect to any node in any data center using the CQL.
Key Structures in Cassandra
These are the following key structures in Cassandra:
- Node – This is where the data is stored. It is the most basic component of Cassandra. It can be thought of as a single server in a rack. It ensures that there is no single point of failure.
- Data Center – A data center is a collection of nodes. This can be either a physical one or a virtual one. Depending upon the workload data centers are divided and chosen. The replication factor is decided on the basis of the data center. Depending on this replication factor data can be written to different data centers.
- Cluster – Cluster comprises one or more data centers. Clusters usually span on different physical locations.
In addition to these, the other components which play a part in Cassandra are as below.
1. Commit Log
The data which is committed for maintaining the durability of data is stored in the commit log. The data is moved to a sorted string table (explained next). Once this movement is done then the commit log can be archived, deleted or recycled.
2. SS Table
This table as mentioned in the previous point stores the log or memory tables at regular intervals. It is an immutable data file. SS tables can store data frequently in a sequential manner. They append data and maintain information for every Cassandra table.
3. CQL Table
The Cassandra Query table is a collection of ordered columns that can fetch a row from this table. There are columns stored in this table where data can be fetched by making use of the primary key.
4. Bloom Filter
It is a simple kind of cache where there are non-deterministic algorithms stored for testing. It checks whether an element is a member of the set or not. These filters are usually accessed after every query that runs.
Key Components to Configure Cassandra
There are the following components in Cassandra:
- As the name suggests, there has to be communication between peers in order to discover and share location and state of information about all nodes.
- This information should persist in local so that each node can use the information as soon as a node must restart. Nodes discover information about other nodes by exchanging information.
- This can be done for a maximum of three nodes. The information is not shared with every node which is present in the cluster or data center. The information is shared with a few nodes but eventually the state information traverses throughout the cluster.
- The partitioner decides which node has to receive the first replica of any data. It is also responsible for taking care of the distribution of these replicas.
- It will determine which node should have which replication in the cluster. Every row of data should be identified uniquely. This can be done by making use of a primary key or partition key.
- The partitioner is a hash function which helps in getting a token from a primary key of any row. Each node has a num_token value assigned to it which can be set as the partitioner.
- The token value that is generated helps in determining which node receives the replica of the rows.
3. Replication Factor
- This factor determines the total number of replicas present across the cluster. If the replication factor is 1, then there is only one copy of each row on one node.
- Similarly, if the replication factor is two, there will be two copies maintained where every copy is present on a different node. As mentioned earlier there is no master-slave architecture in Cassandra every copy is important.
- The replication factor is defined for every data center. This factor should be greater than one but not more than the number of nodes present in the cluster.
- The replication strategy which helps in getting the place where replicas are to be placed for a group of machines in the data center and the rack is known as Snitch.
- There is a dynamic layer that helps in monitoring and performance and helps in choosing the best replica from which data can be read. Snitches should be configured only when a cluster is created.
- It has default values enabled for most deployments. The configuration changes can be made in Cassandra.yml file where the dynamic snitch threshold for each node is present.
5. Merkle Tree
- There can be differences in data blocks. In order to find the differences easily Merkle tree is a hash tree that helps in doing this.
- The leaf nodes of the hash tree contain hashes of separate data blocks and parent nodes have the information or they store the hashes of their children as well.
- By using this technique it is easier to find differences between the nodes that are present.
6. Mem Table
- This table has information about cache whose data is not flushed yet and is residing in the memory.
Cassandra is a NoSQL database that is useful in processing huge amounts of data. It does not have a typical master-slave architecture and hence all nodes are equally important. The nodes have replicas across the cluster as per the replication factor. This ensures the consistency and durability of the data. With all these features it is clear that Cassandra is very useful for big data. Cassandra hence is durable, quick as it is distributed and reliable.
This is a guide to Cassandra Architecture. Here we discuss the Introduction, Cassandra architecture, key structure, and key components of Cassandra. You can also go through our other suggested articles –