Sharding is a database scaling technique where you split a large database into smaller, faster, more manageable pieces called “shards.”
Each shard contains a subset of the data, and all shards together make up the full dataset.
🧠 Why Use Sharding?
- ⚡ Improves performance: Reduces load per server
- 🌍 Increases scalability: More shards = more capacity
- 🧩 Enables horizontal scaling: Add servers to scale out
🗂️ Example
Suppose you have a users table with 1 billion users:
Instead of storing all users in one huge table on one server, you could:
| Shard | Range |
|---|---|
| 1 | Users with ID 1–100M |
| 2 | Users with ID 100M–200M |
| 3 | Users with ID 200M–300M |
| … | … |
Each shard lives on a separate server or database instance.
🛠️ Types of Sharding
| Type | Description | Example |
|---|---|---|
| Horizontal sharding | Split rows | users by region or ID range |
| Vertical sharding | Split columns | users_basic vs users_private |
| Directory-based sharding | Use a lookup table | Keeps track of which user is in which shard |
⚙️ How It Works
- Shard key: A column (e.g.
user_id,region,tenant_id) used to decide which shard data goes into - Application logic or middleware routes queries to the right shard
📦 Benefits
- ✅ Load is spread across multiple machines
- ✅ Parallelism improves throughput
- ✅ Reduces contention and I/O bottlenecks
⚠️ Challenges
| Issue | Description |
|---|---|
| ❌ Cross-shard joins | Harder to query across shards |
| ❌ Resharding | Moving data when scaling or changing strategy |
| ❌ Complex logic | App must handle routing and aggregation |
| ❌ Consistency | More difficult to ensure strong consistency across shards |
🔄 Example Sharding Strategy
// Pseudo-code for picking a shard:
shard_id = hash(user_id) % num_shards
Then route the query to shard[shard_id].
🧰 Common Use Cases
- Large-scale social networks
- Multi-tenant SaaS apps
- High-volume e-commerce platforms
- Large log/event stores
🧱 Summary
| Concept | Description |
|---|---|
| Sharding | Splitting data across DB instances |
| Shard Key | Field used to determine shard |
| Goal | Scale out performance + storage |
| Trade-offs | Complexity, joins, consistency |