Applications of Using NoSQL Databases

2861 words (11 pages) Essay in Information Technology

23/09/19 Information Technology Reference this

Disclaimer: This work has been submitted by a student. This is not an example of the work produced by our Essay Writing Service. You can view samples of our professional work here.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UK Essays.

CA1 for advanced databases

Research Report

Individual Assignment CA 1

Introduction:

In this report I will outline the motivation behind using NoSQL databases and the main characteristic of key value-store databases, column-oriented databases and document store databases. I will give an example of each of the three different types of NoSQL databases and describe how they work along with their key features.

I will conclude the report with a detailed analysis and comparison of NoSQL databases and relational databases including and evaluation of the strengths and weaknesses of each. I will also outline the future prospects for both types of databases.

What is NoSQL database

NoSQL databases came around because of the challenge relational databases were facing with scalability and performance. Companies such as Google, Amazon and Facebook came up with the creating of NoSQL databases such as BigTable, DynamoDB and Cassandra to fix their problem of scalability issues after recognising it.

NoSQL databases can be broken down into four types, there is key- value store databases, column-orientated databases, document store database and last graph-based databases. These DBMS’s or database management systems are more focused on the performance, reliability and consistency of the database. Index structures that were already in place were reused and improved with purpose of making their read and write functions better. A big difference between the4 classic relational database and NoSQL databases is the fact the NoSQL databases are made of unstructured storage. This means that NoSQL databases do not have a fixed table structure like the ones found in relational databases.

Main Characteristics & Motivation

1.       Key-value store databases:

In the key value store type hash tables are used in which a unique key points to an item. Keys can be organised into logical groups, only required keys are to be grouped within their own groups. This allows identical keys to be used in numerous logical groups, some of the implementations of the key value store provide caching mechanisms, this realty enhances the performance. All that is needed to deal with the stored items in the database is the key value. Data is stored in the form of a String, JSON or Binary Large Object. The biggest flaw in this NoSQL database type is the lack of consistency. However, this can be added in by developers with their own code. This brings its own problems such as more effort, complexity and time for the developers. The most famous key-value store NoSQL databases are the likes of Amazon’s DynamoDB.

Here is an image to represent how Key-vale store database store data:

Key

Value

“Belfast”

{“University of Ulster, Belfast campus, York Street, Belfast, BT15 1ED”}

“Coleraine” 

{“University of Ulster, Coleraine campus, faker Road, Co. Londonderry, BT52 1SA”}

  1. Column-oriented databases:

IN these Columns store databases, data is stored in columns instead of being stored in rows like they would be in classic relational databases. A column store database must have at least one or more column family in it that will be grouped together logically in the database. Keys are used in the column database to point to columns or several columns depending on what is needed with in the database. The key space attribute will then define the scope of the database. Column database families will contain tubles as well which will have the names and values, ordered or comma separated.

In Column databases read/write access is given to the data that is stored in the rows. These rows will correspond to a single column that is stored in a single disk space. The best and most popular NoSQL Column-orientated store databases include Google’s BigTable, HBase and Cassandra.

  1. Document-store database:

Document store databases share a lot of similarity with the key value store databases, they are both schema free and are both of course based on the key value model. Therefore, both naturally share a lot of the same advantages as well as the disadvantages. They both lack consistency in the database level which then through a knock-on effect makes for applications to be needed for reliability and consistency but of course as much as they share similar features they also have a lot of things that are different from one another. Some key features that they do not share would be that in a document store the  databases that are stored would give encoding to the data that is stored within. Those encodings can be XML, JSON or BSON which stands for Binary Encoded JSON. The most popular databases application that runs on a document store database is MongoDB.

Examples of each database

  1. Amazons DynamoDB

Amazons DynamoDB is a type of key-value store database. In this database they use tables, items and attributes as the core components that the user works with. The table is a collection of items, and each item is a collection of attributes. Dynamo also uses primary keys in uniquely identifying each item in a table and secondary indexes to provide more flexible querying.

Key attributes of Amazons DynamoDB are:

Tables: A table is collection of data populated by items.

Items: Tables are filled with items, items are groupings of attributes that are uniquely identifiable among all the other items. In DynamoDB there is no limit to the number of items you can store in a table. Items in DynamoDB are like rows, records and tubles in other database systems.

Attributes: Attributes are fundamental data elements and all items must have at least one attribute. Attributes are the characteristics of the items, in DynamoDB attributes are similar in many ways to fields or columns in other databases.

Primary Keys: Primary keys are vital parts of all tables. Primary keys are used to uniquely identify each item in the table so that no two items can have the same key. There are two different types in DynamoDB.

Partition Key: A simple primary key, composed of one attribute known as the partition key. DynamoDB uses the partition key’s value as input to an internal hash function. The output from the hash function determines the partition in which the item will be stored. In a table that has only a partition key, no two items can have the same partition key value.

Partition and Sort key: DynamoDB uses two different types of keys. The first being the partition key and the second being the sort key. Firstly, the partition key would be used as an input to the internal hash function. The output then from the hash functions would determine where the item itself will be stored. Obviously, all items with the exact same partition key would be grouped together and stored in a sorted order by the sorting key. In a table that has a partition key and a sort key, it’s possible for two items to have the same partition key value. However, those two items must have different sort key values.

  1. Cassandra

Cassandra is a database management system built and made by Facebook and it is an example of column-orientated databases. Their goal for Casandra was to build DBMS that has no one single point of failure and provides the best and maximum availability. Some studies say that Cassandra is a hybrid mix system that is based of two other database types. These are Googles BigTable and Amazons DynamoDB, which are column store database and key value database respectively. Cassandra uses a key value system which allows they keys to point at a set of column families. It relies Googles BigTable distributed file system and DynamoDB’s availability features. It is designed to manage huge amounts of data that are spread across different nodes. This DBMS is used to handle massive amounts of data across several servers while providing a highly available service with no single point of failure. Essential for a business-like Facebook.

The main features of Casandra are:

No single point of failure: this is achieved by having all the data on a cluster of nodes. Not meaning that each node has the exact same data as all the rest, but the management system will be the exact same. When a node goes down that means that this node would be inaccessible, but the other nodes and data will still be accessible.

Distributed Hashing: This is a scheme that allows for the removal and addition of slots without harming or changing the mapping of other keys to their slots. This allows for the ability too share the load over the servers or nodes depending on their capacity therefore minimizing downtime.

Easy to use Client interface: Cassandras client interface is Apache thrift and it gives a cross language RPC client, however most if not all developers prefer open sourced platforms such as Hector as an alternative to Apache.

Other available features: Cassandra used data replication and a partitioning policy to maximise data protection. Data replication can be done randomly or intentionally. Cassandra’s partitioning policy is used to pick where on each node the key will go, this can be done randomly or selectively. Casandra will try balance between load balancing and query performance optimization.

Consistency: When data replication is happening, it is hard to be consistent. Casandra maintains as best it can a balance between replication actions and read/write actions. IT does this by giving the developer full customization.

Read/Write Actions: The client would send out a request to the database and depending on the replication policy the node should store the data in a cluster. The read operation is also very similar to the write function, a request is sent to a single node and that single node is the one that determines which node holds the data according to the partition/placement policy.

  1. MongoDB

MongoDb is a scheme free document orientated database written in C++. The database stores documents in the form of encoded data. The encoded format in MongoDB is JSON. This makes the database very powerful because the data is always query able and indexable even if the data is nested within JSON.

The main features in MongoDB are:

Shards: Sharding is the partitioning and distributing of data across multiple nodes. Unlike Cassandra where nodes are symmetrically distributed a shard is a collection of nodes in MongoDB. Using shards opens the database up to horizontal scaling across many nodes. If a application is using a single database server it can be converted easily to a sharded cluster with practically no changes to the original applications code. This is done through the MongoDB software that decouples from the public APIS exposed to the client side.

Mongo Query Language: MongoDB uses a restful API to retrieve certain documents from the database collection. Then a query document is created containing the fields that the desired documents should match.

Actions: In MongoDb groups of servers are called routers. Each router acts as a server for at least one client. The cluster then contains groups of servers that are called configuration servers. Metadata is held within each server and within each server the shard holds the data. Read and write functions are handled by sending the actions out to the client servers to a cluster where they are automatically routed to the appropriate shard with the necessary data in it to help the configuration. There is replication of data like in Casandra, the shard themselves can hold replicated data. However, there are two different types of data replication. There are Master-Slave replication and Replica-set. Replica-set allows for the better handling of failures and more automation. The master-slave replication needs administrative intervention from time to time. At any one-time replica set shards have only one primary shard all others are therefore secondary shards. The read and write functions would go through the primary shard first then they would be distributed to the necessary secondary shards.

Here is an image representing how MongoDB sharding works.

NoSQL vs. Relational Databases

Here is a table that is a simple breakdown of how NoSQL stacks up against Relational Databases.

Feature  NoSQL Databases  Relational Databases

  1. Performance High    Low
  2. Reliability Poor    Good
  3. Availability Good    Good
  4. Consistency Poor    Good
  5. Data Storage Optimized for huge data Medium sized to large
  6. Scalability High    High(more expensive)

In this small basic table, the comparison is only done on the database level not the database management system that implements the models. The systems themselves come up with techniques to overcome some of there downfalls. Some even make it better in the performance and reliability area. .

Advantages and Disadvantages of NoSQL Databases

Advantages

The advantages of NoSQL databases compared to that of traditional relational databases are such of being scheme free, having a flexible and simple structure, based on key value pairs. Some store types of NoSQL databases include column store, document store, key value store, graph store, object store, XML store, and other data store modes. I have outlined three of the database types already in previous sections. Each value in the database has a key. Most of the NoSQL databases allow for developers to store serialized objects in their databases. Most open sourced NoSQL databases are cheaper and don’t need expensive licensing to run and can be ran on cheaper hardware making their deployment cost effective. Also, when you are working with NoSQL databases the expansion of the database is very easy and cheaper the relational databases. This is because it’s done by horizontally scaling and distributing the load on all nodes, rather than the type of vertical scaling that is usually done with relational database systems, which is replacing the main host with a more powerful one.

Disadvantages

Obviously, nothing is ever perfect, and NoSQL is subject to this. Sometimes NoSQL shouldn’t be used judging on what type of business you are in or how big the business is. Most NoSQL databases do not reliability features that are supported on relational databases such as atomicity, consistency, isolation, and durability. This leads to the trading of consistency for performance and scalability. These are worked on by developers implementing their own code to add complexity to the system. This might limit the number of applications that can rely on NoSQL databases for secure and reliable transactions, like banking systems. sOther forms of complexity found in most NoSQL databases include incompatibility with SQL queries. This means that a manual or proprietary querying language is needed, adding even more time and complexity.

Conclusion

In my opinion NoSQL databases are superior to normal SQL databases, they are more reliable with less cost. They handle more data and are geared towards making life easier for the developer, therefore they are used in bigger businesses like Facebook, Amazon and Google. Seeing as these companies that are worth trillions of dollars are creating and using the NoSQL databases I can safely say that we will be looking to use them in the future for new big businesses. I would say that nothing is ever perfect and NoSQL databases need to work on the reliability and consistency and then they would truly be the superior database management type.

Bibliography

Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please: