Partitioning and Sharding in databases

1. Partitioning

Definition:
Partitioning is the process of dividing a single database table into smaller, more manageable pieces, called partitions, while still keeping them within the same database. Each partition can be stored separately, often based on some key (like date, region, or ID).

Purpose:

  • Improves query performance.
  • Makes management easier (backups, archiving).
  • Can reduce the amount of data scanned for a query.

Types of Partitioning:

  1. Range Partitioning – data is divided by ranges of values.
    • Example: Orders table by order date: Partition Date Range P1 2020-01-01 to 2020-06-30 P2 2020-07-01 to 2020-12-31
  2. List Partitioning – data is divided by a list of discrete values.
    • Example: Customers table by country: Partition Countries P1 USA, Canada P2 UK, Germany
  3. Hash Partitioning – data is divided using a hash function on a column (often ID).
    • Example: Users table by user_id hash → distributes evenly across partitions.
  4. Composite Partitioning – combines two or more types (e.g., range + hash).

Example (Range Partitioning):

CREATE TABLE Orders (
    order_id INT,
    order_date DATE,
    customer_id INT,
    amount DECIMAL(10,2)
)
PARTITION BY RANGE (YEAR(order_date)) (
    PARTITION p2020 VALUES LESS THAN (2021),
    PARTITION p2021 VALUES LESS THAN (2022)
);

2. Sharding

Definition:
Sharding is horizontal partitioning at the database level, meaning the data is split across multiple databases or servers (called shards). Each shard contains a subset of the data.

Purpose:

  • Improves scalability.
  • Handles large datasets by distributing load across servers.
  • Reduces a single server’s resource bottleneck.

Key Difference from Partitioning:

  • Partitioning: usually within one database.
  • Sharding: spreads across multiple databases or servers.

Example (Sharding Users by Region):
Suppose you have a large user table:

User IDNameRegion
1AliceUS
2BobEU
3CarolUS

You can shard it into two databases:

  • Shard 1 (US users) User ID Name Region 1 Alice US 3 Carol US
  • Shard 2 (EU users) User ID Name Region 2 Bob EU

Sharding logic can be based on:

  • Range (e.g., ID ranges)
  • Hash (hashing user ID to choose a shard)
  • Directory (explicit mapping to shards)

Quick Comparison:

FeaturePartitioningSharding
ScopeSingle databaseMultiple databases/servers
PurposePerformance, manageabilityScalability, load distribution
Data SplitTablesWhole database tables
ComplexityLowerHigher

✅ Summary:

  • Partitioning: Split table data inside one database (e.g., Orders 2020 vs 2021).
  • Sharding: Split data across multiple databases (e.g., Users in US vs EU servers).

Leave a Reply