What is Sharding

Sharding is a technique to break a database into smaller parts(chunks)

Each individual partition is referred to as a database shard or simply as a shard.

This is a horizontal partition of data in a database.

For load sharing, each shard is held on a separate database server instance.

If replication factor =3 , means a shard data is replicated in 3 nodes. So if any one or two nodes fails, we have third node to recover data

Advantages of Sharding

Improved Search Performance: Since the tables are divided and distributed into multiple servers, the total number of rows in each table in each database is reduced. This reduces index size, which generally improves search performance.

Disadvantages of Sharding

References:

1. Wikipedia

Sharding Methods

1. Horizontal Partitioning

putting different rows in different tables
Range Based sharding , Example based on Alphabetic names

Example Names starting with A-D in database 1,

Tis range could be based on picnocdes

So according to range we store data

Problem: Many names with A-D and less with X-Z , so imbalance

Hence this horizontal par not so good, We will see how to improve

2. Vertical

Suppose we have 5 schemas in database

Now we storing User information in different database, Location in different database, Trabnsaction History in different database, Phitis in different database.

So means 1 feature in 1 database.

Also not good if one database has more data, and one has less data, again imbalance, so not good partitoning

3. Directory Based Par

Suppose we have big data with 1 million lines.

Againt each row we print, this row in which DB, Against Yeah row we define that data in DB1 or DB2..

Each row has its DB number

Sharding Criteria

Now coming on what criteria to divide data

1. Hash Based

Most important

we can choose any entity in database . Example Name/EmailADdrress

Based on entrtity value, hash code is generated, and value assigned in hashTABLE

Our HashTable contain all DB detains

SO we know which DB it would go.

Very Efficient. We can use COncistent Hashing

2. List parti

based on list we can divide

Example based on Regions example APAC (Asia Pacific) ,US

Means all APPAC Customers data put in one DB, US citizens n another DDB

Problem: imbalance in DB ize

3. RounD Robin :

One request to first DB, second to second, third to thriods

4. Composite:

Combine any above schemas

List PARTIONING

inside US, we can divide on the basis on hash

Sharding Challenges

In Sharding we have distributed across world, Example Cassandra data across Regions APAC/ US. diffricly to maintain ACID compliance . SQL databases basically demand ACID
Joins Inefficient
Workaround is to denormalise the DB so that queries that required joins can be performed on single table.
Trying to enforce data integrity constraint suh as foreign keys in a sharded DB can be extremely difficult. Most of RDMS do not support this.
Rebalancing:When data distribution is not uniform. - When there is lot of load on a particular Shard.

Now if we have dataacross regions, we can have tradeoff

CS By Saurabh Sir

Advantages of Sharding

Disadvantages of Sharding

Sharding Methods

1. Horizontal Partitioning

Sharding Criteria

Sharding Challenges

Post a Comment

Post a Comment

Contact Form

CS By Saurabh Sir

What is Sharding

Advantages of Sharding

Disadvantages of Sharding

Sharding Methods

1. Horizontal Partitioning

Sharding Criteria

Sharding Challenges

You might like

Post a Comment

Post a Comment

Contact Form