Sharding

As your data grows, it starts throwing up problems of scale. Sometimes only a set of tables grow at a very high rate as compared to others. In such cases, the indexes defined on these tables also start consuming more space. Searching through these tables becomes time-consuming. This is where you can benefit from database sharding.

What is sharding?

Literally, shard means a piece of a whole. When put in the database context, sharding refers to the horizontal partitioning of the database in order to achieve performance improvements. In this type of partitioning, data is distributed across multiple schemas, potentially on different servers. This is a horizontal type of partitioning where rows from the same table are distributed on different shards using some algorithm.

Sharding - distribute your dataset into multiple databases
Sharding – distribute your dataset into multiple databases

Advantages:

Since the data is distributed, the number of rows in each table in each database are reduced. This results in reduced index size. And it, in turn, provides better search performance.

When sharding strategy is decided, one must observe the data access pattern. If data is accessed frequently using geographic location of the customer, then sharding should be done which separates customer data based on geographic location. If done wisely, every query needs to look up only one database and the data can be retrieved faster.

Disadvantages:

As mentioned above, this needs to be done wisely. If not done correctly, many of your queries may result in querying multiple shards and hence multiple servers, which would add to latency. This would degrade the performance drastically.

Sharding adds complexity to the architecture at multiple levels – backups, operational effectiveness, failovers and SQL queries. So while deciding the need for the sharding, one must evaluate all these angles. Many-a-times, you might be better off with simple horizontal partitioning in a single schema.

Support

Many of the well known DB implementations have support for the sharding including MySQL, Apache HBase, MongoDB.

Related Links

 

Related Keywords

Partitioning, Horizontal Partitioning, Database, MySQL, NoSQL, Big Data

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.