This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Apache Cassandra is an open source tool which deals with Distributed Databases. It deals with handling large volumes of data which is distributed across many servers at many locations. Cassandra was originally created by Facebook and it is currently in use at many big companies such as Netflix, eBay, Twitter, Ooyala etc. Apache Cassandra is developed based on BigTable (from Google) and Dynamo (from Amazon). Cassandra provides many features but linear scalability, fault tolerance and data replication are among the most important features of it.
Cassandra follows Key value pair structure. There are mainly 4 basic concepts:
The nodes in logical space are known as Clusters. Each cluster can have one or more keyspaces.
Namespace for Column Families is known as Keyspace. Each application has dedicated keyspace.
Column Families mentioned above can have multiple columns. Each column has values such as name, timestamp and value and each column is referenced and accessed by its keys.
SuperColumns can have subcolumns as well.
Cassandra has following features:
Synchronous and Asynchronous Data replication
Performance and Durability
For the scope of this research paper, we will focus on "Synchronous and Asynchronous Data replication" and "Data Consistency" of Apache Cassandra. 
Cassandra takes a single database and distributes the data among the nodes in the cluster. This leads to partitioning of data among the given nodes. So, each node contains a part of actual database. Then data in each node is replicated in 1 or more nodes as per requirement. This replication on data on multiple nodes provides more reliability and increases fault tolerance.
How it works: First step is to partition the data
When data is inserted, row key is generated. Based on the generated columnfamily row key, data is placed into specific node. Data partitioning can be one in one of the following ways.
Random Partitioning: For every columnfamily row key, MD5 hash is generated and data is placed into particular node based on the range defined for MD5 hash. This is recommended method of partitioning.
Ordered Partitioning: This method stores the columnfamily row keys in sorted order in given nodes. This method has expected order and this method is not recommended as this may lead to uneven data distribution within nodes of given cluster.
To partition the data in Cassandra, Partitioner option in config file is used. However once the data is partitioned, if the need to re-partition data arises after that - entire data needs to be reloaded.
Next Step: Replicate the data
Figure 1: Data Replication in Cassandra
To make the system more fault tolerant and robust, data needs to be replicated. If data replication factor is 1, that would mean there is no replication of data for this particular cluster. If Data replication factor is 3, that means total 3 copies of data are kept in given node which also includes the original copy. In general, data should not have replication factor greater then no. of total nodes. However, it permitted. But when that happens, write request gets rejected and read request are handled if desired consistency level is met.
When determining how many copies of data should be kept, two major factors play important role in determining number of copies that needs to be kept for a particular data.
Importance of satisfying read requests locally without having to deal with cross-data center latency.
Ability to handle failure scenario
The most common arrangement includes:
Keeping 2 copies of data per data center: This arrangement handles failure of 1 node per group and it also supports read operations with consistency level 1.
Keeping 3 copies of data per data center: This arrangement handles multiple node failure at consistency level 1. Alternatively it also allows failure of 1 node in a group with strong consistency level at Quorum.
Following are different replication strategies.
Simple Replication: Original copy of data is put into Node determined by Partitioner and other copies are put on neighbor nodes in clockwise direction. This does not consider rack of data center location.
Network Topology Strategy: This method gives more control over data and it allows replicating data among different racks within same data center as well replicating data in different data center's nodes. Reason for implementing Network Topology Strategy is to save system from effects of power failure or network problem cooling issue. If node on different rack is not found to replicate the data, then same rack's node in clockwise direction is selected. To replicate data between different data centers, "replica group" is used. Following is example of CQL (Cassandra Query Language):
CQL Query Example:
CREATE KEYSPACE examplekeyspace WITH
strategy_class = 'NetworkTopologyStrategy' AND strategy_options: DC1= 4;
Here DC1 is the replica group and 4 is no of replicas to be done.
Replication Mechanics Details:
Cassandra uses snitch to determine grouping of nodes within cluster as well as network topology of racks and data centers. Snitch maps the IP address with racks and data centers of the system. Using snitch makes it possible to route inter-node request in most efficient way. However using snitch would not control which node would client request get connected to in order to get its request processed. There are 5 different types of snitch as shown below:
Simple Snitch: It is used for simple replication and it is default setting.
Rack Inferring Snitch: It uses IP address to analyze network topology. It considers that second octet determines data center and third octet determines rack.
Dynamic Snitch: It wraps another snitch, it reorders node to direct traffic in most efficient way. In order to do that it reorders the node based on its performance.
Property File Snitch: It uses user defined property file details for identifying network details and to determine location of nodes.
EC2 Snitch: It is used only for Amazon EC2 deployments. IT uses AWS API to find out details of availability zone and region. 
Cassandra a peer to peer architecture doesn't have read-write restrictions, so it allows user to read and write anywhere by accessing any node. Cassandra automatically takes care of partitioning and replication of data from that point onwards. When a client request comes, a client can connect to any node. The node it gets connected to acts as coordinating node and a proxy between the request and the nodes that owns the data that is being requested. 
Strategies: Writing the Data
Any new data is written first time in commit log. Writing data to commit log allows durability and safety of data. Then data is also written into Memtable and if Memtable gets full, data is flushed to Stable, and then onto the disk. The key feature of write operation in Cassandra is that write is atomic. Either entire write operation (all columns) is done or nothing is written. Cassandra has 12 times better throughput in write operation and 4 times better input in mix operations.
Writing: Control over tuning of Data Consistency
Cassandra allows user to select how much it wants to tune its database for consistency for its need. Also this setting of consistency is associated with each operation instead of entire node or rack or data center. This allows user a great deal of flexibility. There are different types of strategies for write operation.
Figure 2: Write Requests
Any: A write must succeed for any given node.
One: A write must succeed for any node that is either primary or replica and responsible for that data.
Quorum: A write needs to be successful on Quorum of replica nodes. (Quorum is decided based on: (replication factor + 1) / 2
Local_Quorum: Write needs to succeed for quorum of replica nodes in same data center as the coordinating node.
Each_Quorum: Write needs to succeed for quorum of replica nodes in each data center.
All: Write must succeed for all replica data nodes for row key. 
When a write request comes, if a node is either known to unavailable ahead of time OR is not available at the time of request then the hint is stored locally by coordinator in systems.hint table.
Once the unavailable node comes back then the data is copied back to this target node. It also checks for hints for writes that occurred during short outage time for detector to notice. If insufficient replica target nodes are alive then exception is thrown. Â This can be understood better with following example:
If two nodes X and Y have replication factor of 1, this means each data row is stored on one node.
Now if we try to write the data while X is down, then this write request should fail. The reason is as per consistency level contract "read must reflect most recent write when W+R > RF. Here W represents number of writes and R represents number of reads, whereas F is the replication factor.
It is not okay to write the hint on Y and call the write operation successful as data written to Y cannot be read until X comes back and Y passes data back to X.
Hinted Handoff operation allows Cassandra to allow same number of write operations even if system is running at a reduced capacity. However pushing the cluster for maximum capacity without keeping account for failure is a bad design. The main purpose of Hinted Handoff is to reduce the excessive load on cluster. 
Strategies: Reading the Data
Figure 3: Read Requests
Similar to writing, there are several strategies for reading the data from Cassandra.
One: Read the data from the closest available node.
Quorum: Return the result with latest timestamp from quorum of servers.
Local_Quorum: It is same as Quorum method, except that it should be from same data center as coordinating node.
Each_Quorum: This will check all data centers and would return the result with latest time stamp.
All: This will return result from all replica nodes for a given row key.
Cassandra allows a node to send a repair request in background when reading a particular piece of data from the node. When this repair request is received from coordinator node, the data in the node it was just read from gets updated from other replica nodes' data with latest timestamp. This makes sure that data within all replica nodes is always consistent and latest. Read Repair is enabled by default and it is configured per columnfamily. 
Ant-Entropy Node Repair:
Anti-Entropy repair is mainly used to update data that is not accessed frequently or for accessing data from a node that is down for a long time. 
Cassandra also has certain limitations which should be taken into consideration while selecting Cassandra for data management.
Cassandra limits the size of keys as well as column values to be under 64K bytes.
It also imposes size limit on number of columns per row to be less than or equal to 2 billion.
To use Cassandra, all data that is associated with single row must be able to fit in single machine cluster due to the limitation of use of row keys to determine nodes that would be responsible for data replication. 
Cassandra is becoming a popular NoSQL based solution for handing large amount distributed data. It is currently in use at many software companies including Facebook, but still it is heavily under development. It is useful solution for dealing with distributed data and it a good product that provides best of both worlds (Dynamo and BigTable).