Sharding is a technique to break a database into smaller parts(chunks)
Each individual partition is referred to as a database shard or simply as a shard.
This is a horizontal partition of data in a database.
For load sharing, each shard is held on a separate database server instance.
If replication factor =3 , means a shard data is replicated in 3 nodes. So if any one or two nodes fails, we have third node to recover data
Advantages of Sharding
- Improved Search Performance: Since the tables are divided and distributed into multiple servers, the total number of rows in each table in each database is reduced. This reduces index size, which generally improves search performance.
Disadvantages of Sharding
References:
1. Wikipedia
Sharding Methods
1. Horizontal Partitioning
- putting different rows in different tables
- Range Based sharding , Example based on Alphabetic names
Example Names starting with A-D in database 1,
Tis range could be based on picnocdes
So according to range we store data
Problem: Many names with A-D and less with X-Z , so imbalance
Hence this horizontal par not so good, We will see how to improve
2. Vertical
Suppose we have 5 schemas in database
Now we storing User information in different database, Location in different database, Trabnsaction History in different database, Phitis in different database.
So means 1 feature in 1 database.
Also not good if one database has more data, and one has less data, again imbalance, so not good partitoning
3. Directory Based Par
Suppose we have big data with 1 million lines.
Againt each row we print, this row in which DB, Against Yeah row we define that data in DB1 or DB2..
Each row has its DB number
Sharding Criteria
Now coming on what criteria to divide data
1. Hash Based
Most important
we can choose any entity in database . Example Name/EmailADdrress
Based on entrtity value, hash code is generated, and value assigned in hashTABLE
Our HashTable contain all DB detains
SO we know which DB it would go.
Very Efficient. We can use COncistent Hashing
2. List parti
based on list we can divide
Example based on Regions example APAC (Asia Pacific) ,US
Means all APPAC Customers data put in one DB, US citizens n another DDB
Problem: imbalance in DB ize
3. RounD Robin :
One request to first DB, second to second, third to thriods
4. Composite:
Combine any above schemas
List PARTIONING
inside US, we can divide on the basis on hash
Sharding Challenges
- In Sharding we have distributed across world, Example Cassandra data across Regions APAC/ US. diffricly to maintain ACID compliance . SQL databases basically demand ACID
- Joins Inefficient
- Workaround is to denormalise the DB so that queries that required joins can be performed on single table.
- Trying to enforce data integrity constraint suh as foreign keys in a sharded DB can be extremely difficult. Most of RDMS do not support this.
- Rebalancing:When data distribution is not uniform. - When there is lot of load on a particular Shard.
Now if we have dataacross regions, we can have tradeoff
Post a Comment