NoSQL Tutorial
In this tutorial you will learn :
Contents
2. Types of NoSQL Databases. 1
B. Document-Oriented or Document Store Database. 2
1. What is NoSQL ?
2. Features of NoSQL.
3. Types of NoSQL Databases
4. Advantages of NoSQL
5. CAP Theorem
6. Summary
1. What is NoSQL ?
- Usually referred to as “non-sql” or “non relational” database.
- This database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases (means it does not use tables for storing data)
- Used a lot in big-data and real-time web applications . Example: Facebook storing terabits of user’s data every day.
- Advantages of NoSQL databases: provides horizontal scalability, fast performance, and query language supported.
2. Types of NoSQL Databases
|
1. |
Key-Value |
Dynamo, Redis, Memcache DB |
|
2. |
Document |
Mongo DB , Cosmos DB |
|
3. |
Wide Column |
Cassandra, HBase |
|
4. |
Graph |
Neo4J |
-
Key-Value Database
- data is represented as a collection of key-value pairs, such that each possible key appears at most once in the collection. Every entity (record) is a set of key-value pairs.
- Key-value stores can use consistency models ranging from eventual consistency to serializability.
- Examples: Redis, Oracle NoSQL Database.
-
Document-Oriented or Document Store Database
-
Column Based Database
-
Graph Database
3. RDBMS vs NoSQL
|
RDBMS |
NoSQL |
|
Relation Database Management System |
Non-relational or Distributed database |
|
Table based database i.e. data in the form of tables |
Key-value based , document based, wide-column based or Graph based |
|
They have pre-defined or fixed schema |
Dynamic schema |
|
Vertically scalable |
Horizontally scalable |
|
Suitable for complex queries |
Not suitable for complex queries |
|
Not suitable for hierarchical data storage |
Suitable for hierarchical data storage |
|
Emphasize on ACID properties |
CAP theorem and BASE transaction , so best suited for data availability problems |
|
Should be used when data consistency is important |
Used when it is important to get data fast than consistent data. |
4. CAP Theorem
This is also known as Brewer’s theorem. It states that it is impossible for a distributed data store to simultaneously provide guarantee of more than 2 out of following 3:
- (C)Consistency: Means data remains consistent after an operation. Example after last update all read requests get same data.
- (A)Availability: Means system is always running (no downtime). So every request receives a response, without the guarantee that it contains the most recent write
- (P)Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes
Note that consistency as defined in the CAP theorem is quite different from the consistency guaranteed in ACID database transactions. //TODO: add link to ACID
When a network partition failure happens should we decide to :
- Cancel the operation and thus decrease the availability but ensure consistency
- Proceed with the operation and thus provide availability but risk inconsistency
The CAP theorem implies that in the presence of a network partition, one has to choose between consistency and availability.
For example consider following distributed system with two servers and assume our system follow all above three assumptions simultaneously at a time (C,A,P)
Suppose there is network failure, but since our system is Partition Tolerant, it should work all the time.
Case 1 (Consistency): Client (or user) sends update request to Server 1. But since our system is consistent , so Server 1 before sending back response will update Server 2, but since system is not available , request to Server 1 will be timeout. Means System cannot be Consistent with Availability .
Case 2 (Availability): Client (or user) sends update request to Server 1 . Now here Server 1 will send back response to client instead of waiting for value to be updated in Server 2. Now if client queries in Server 2 , it will receive old value as value was not updated due to network failure. Hence , we will not achieve consistency here keeping availability in mind.
All distributed systems have to be Partition tolerant (always up and running in the case of network failure as well). In that case we have to choose either of Consistency or Availability.
Post a Comment