NoSQL Tutorial

In this tutorial you will learn : 

Contents

NoSQL Tutorial 1

1. What is NoSQL ?1

2. Types of NoSQL Databases1

A.     Key-Value Database2

B.     Document-Oriented or Document Store Database2

C.     Column Based Database2

D.     Graph Database2

3. RDBMS vs NoSQL2

4. CAP Theorem.. 2

References: 4

 

 

1. What is NoSQL ?
2. Features of NoSQL.
3. Types of NoSQL Databases
4. Advantages of NoSQL
5. CAP Theorem
6. Summary

1. What is NoSQL ?

  • Usually referred to as “non-sql” or “non relational” database.
  • This database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases (means it does not use tables for storing data)
  • Used a lot in big-data and real-time web applications . Example:  Facebook storing terabits of user’s data every day.
  • Advantages of NoSQL databases: provides horizontal scalability, fast performance, and query language supported.

2. Types of NoSQL Databases

1.

Key-Value

Dynamo, Redis, Memcache DB

2.

Document

Mongo DB , Cosmos DB

3.

Wide Column

Cassandra, HBase

4.

Graph

Neo4J

  1. Key-Value Database

  • data is represented as a collection of key-value pairs, such that each possible key appears at most once in the collection. Every entity (record) is a set of key-value pairs.
  • Key-value stores can use consistency models ranging from eventual consistency to serializability.
  • Examples: Redis, Oracle NoSQL Database.

https://upload.wikimedia.org/wikipedia/commons/5/5b/KeyValue.PNG

  1. Document-Oriented or Document Store Database

  2. Column Based Database

  3. Graph Database

3. RDBMS vs NoSQL

RDBMS

NoSQL

Relation Database Management System

Non-relational or Distributed database

Table based database i.e. data in the form of tables

Key-value based , document based, wide-column based or Graph based

They have pre-defined or fixed schema

Dynamic schema

Vertically scalable

Horizontally scalable

Suitable for complex queries

Not suitable for complex queries

Not suitable for hierarchical data storage

Suitable for hierarchical data storage

Emphasize on ACID properties

CAP theorem and BASE transaction , so best suited for data availability problems

Should be used when data consistency is important

Used when it is important to get data fast than consistent data.

4. CAP Theorem

This is also known as Brewer’s  theorem. It states that it is impossible for a distributed data store to simultaneously provide guarantee of more than 2 out of following 3:

  • (C)Consistency: Means data remains consistent after an operation. Example after last update all read requests get same data.
  • (A)Availability: Means system is always running (no downtime). So every request receives a response, without the guarantee that it contains the most recent write
  • (P)Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

Note that consistency as defined in the CAP theorem is quite different from the consistency guaranteed in ACID database transactions.  //TODO: add link to ACID

When a network partition failure happens should we decide to :

  • Cancel the operation and thus decrease the availability but ensure consistency
  • Proceed with the operation and thus provide availability but risk inconsistency

The CAP theorem implies that in the presence of a network partition, one has to choose between consistency and availability.


For example consider following distributed system with two servers   and assume our system follow all above three assumptions simultaneously at a time (C,A,P)

Suppose there is network failure, but since our system is Partition Tolerant, it should work all the time.

imageCase 1 (Consistency): Client (or user) sends update request to Server 1. But since our system is consistent , so Server 1 before sending back response will update Server 2, but since system is not available , request to Server 1 will be timeout. Means System cannot be Consistent with Availability .

Case 2 (Availability): Client (or user) sends update request to Server 1 . Now here Server 1 will send back response to client instead of waiting for value to be updated in Server 2. Now if client queries in Server 2 , it will receive old value as value was not updated due to network failure. Hence , we will not achieve consistency here keeping availability in mind.

All distributed systems have to be Partition tolerant (always up and running in the case of network failure as well).  In that case we have to choose either of Consistency or Availability.

 

 

References:

  1. https://en.wikipedia.org/wiki/NoSQL
  2. https://en.wikipedia.org/wiki/CAP_theorem
  3. https://en.wikipedia.org/wiki/Key-value_database

Post a Comment

Previous Post Next Post