Sharding is a database scaling technique where you split a large database into smaller, faster, more manageable pieces called “shards.”
Each shard contains a subset of the data, and all shards together make up the full dataset.
🧠 Why Use Sharding?
- ⚡ Improves performance: Reduces load per server
- 🌍 Increases scalability: More shards = more capacity
- 🧩 Enables horizontal scaling: Add servers to scale out
🗂️ Example
Suppose you have a users
table with 1 billion users:
Instead of storing all users in one huge table on one server, you could:
Shard | Range |
---|---|
1 | Users with ID 1–100M |
2 | Users with ID 100M–200M |
3 | Users with ID 200M–300M |
… | … |
Each shard lives on a separate server or database instance.
🛠️ Types of Sharding
Type | Description | Example |
---|---|---|
Horizontal sharding | Split rows | users by region or ID range |
Vertical sharding | Split columns | users_basic vs users_private |
Directory-based sharding | Use a lookup table | Keeps track of which user is in which shard |
⚙️ How It Works
- Shard key: A column (e.g.
user_id
,region
,tenant_id
) used to decide which shard data goes into - Application logic or middleware routes queries to the right shard
📦 Benefits
- ✅ Load is spread across multiple machines
- ✅ Parallelism improves throughput
- ✅ Reduces contention and I/O bottlenecks
⚠️ Challenges
Issue | Description |
---|---|
❌ Cross-shard joins | Harder to query across shards |
❌ Resharding | Moving data when scaling or changing strategy |
❌ Complex logic | App must handle routing and aggregation |
❌ Consistency | More difficult to ensure strong consistency across shards |
🔄 Example Sharding Strategy
// Pseudo-code for picking a shard:
shard_id = hash(user_id) % num_shards
Then route the query to shard[shard_id]
.
🧰 Common Use Cases
- Large-scale social networks
- Multi-tenant SaaS apps
- High-volume e-commerce platforms
- Large log/event stores
🧱 Summary
Concept | Description |
---|---|
Sharding | Splitting data across DB instances |
Shard Key | Field used to determine shard |
Goal | Scale out performance + storage |
Trade-offs | Complexity, joins, consistency |