When we have a lot of users, then we segregate the data into multiple databases as we can’t store the data in one database as we have hardware constraints
Now when we divide the data into multiple databases we also need to know which database to look into when we need to retrieve data of a particular user
1. Segregate via Id
We take Id of the user and divide it with number of servers(databases) and with respect to the remainder will send data into respective database
package main
import "fmt"
func getDatabaseIndex(userID int, numDatabases int) int {
if numDatabases <= 0 {
panic("Number of databases must be greater than 0")
}
return userID % numDatabases
}
func main() {
numDatabases := 4
userIDs := []int{101, 202, 303, 404, 505, 606}
for _, userID := range userIDs {
dbIndex := getDatabaseIndex(userID, numDatabases)
fmt.Printf("User ID %d → Database %d\n", userID, dbIndex)
}
}All good but what if, the user load increases even more and we need to spin up another server(database) then this function will not work and we need to rehash the data and should move the data into new databases
2. Consistent Hashing
The modulo-based approach fails when adding/removing databases because most keys remap, requiring massive data migration. Consistent hashing solves this by minimizing remapping to only k/n keys (where k=keys, n=nodes)
How it works:
- Databases and keys are hashed onto a ring (0-2³² range)
- All the database nodes get hashed and get their spot in the ring
- The data is now hashed and also get their spot in the ring, If there is database node in that spot they will go into that database otherwise they will go into the next database present in the clockwise direction
- Virtual nodes (vnodes) balance load distribution
- Adding/removing nodes only affects adjacent neighbors
Edge cases:
- If a new node added - Move the data which is present between the nearest anti clockwise database to new node database into new node database
- If a Database Node failed - auto-remap to next node and migrate the data into it
- If the nodes came too close to each other via hashing, Increasing load on one database(Hotspots) - use vnodes(virtual nodes - create virtual nodes from the original node and route data to it) and distribute load
- Node Flapping - Rapid join/leave cycles can cause churn; implement join/leave batching or backoff
- Uneven node capacities - weighted vnodes
- Hash collisions - rare, but can be handled by secondary hashing
package main
import (
"crypto/sha1"
"math"
"sort"
"sync"
)
type Ring struct {
sync.RWMutex
nodes map[uint32]string // hash → node
vnodeMap map[string]int // node → vnode count
keys []uint32 // sorted ring positions
}
// AddNode with configurable vnodes for capacity weighting
func (r *Ring) AddNode(node string, vnodes int) {
r.Lock()
defer r.Unlock()
if r.nodes == nil {
r.nodes = make(map[uint32]string)
r.vnodeMap = make(map[string]int)
}
for i := 0; i < vnodes; i++ {
vnode := node + "#" + string(i)
hash := hash(vnode)
// Handle hash collisions
for _, exists := r.nodes[hash]; exists; {
hash = rehash(hash)
}
r.nodes[hash] = node
r.keys = append(r.keys, hash)
}
r.vnodeMap[node] = vnodes
sort.Slice(r.keys, func(i, j int) bool {
return r.keys[i] < r.keys[j]
})
}
func (r *Ring) RemoveNode(node string) {
r.Lock()
defer r.Unlock()
vnodes := r.vnodeMap[node]
for i := 0; i < vnodes; i++ {
vnode := node + "#" + string(i)
hash := hash(vnode)
delete(r.nodes, hash)
// Remove from sorted keys
idx := sort.Search(len(r.keys), func(i int) bool {
return r.keys[i] >= hash
})
r.keys = append(r.keys[:idx], r.keys[idx+1:]...)
}
delete(r.vnodeMap, node)
}
func (r *Ring) GetNode(key string) string {
r.RLock()
defer r.RUnlock()
if len(r.keys) == 0 {
return ""
}
hash := hash(key)
idx := sort.Search(len(r.keys), func(i int) bool {
return r.keys[i] >= hash
})
// Handle ring wrap-around
if idx == len(r.keys) {
idx = 0
}
return r.nodes[r.keys[idx]]
}
// hash generates 32-bit position (SHA1 truncated)
func hash(s string) uint32 {
h := sha1.Sum([]byte(s))
return binary.BigEndian.Uint32(h[:4])
}
// rehash for collision resolution
func rehash(prev uint32) uint32 {
return (prev * 11400714819323198485) % math.MaxUint32
}
func main() {
ring := &Ring{}
ring.AddNode("db0", 100) // Higher capacity = more vnodes
ring.AddNode("db1", 75)
ring.AddNode("db2", 50)
// Keys map to nearest node
users := []string{"userA", "userB", "userC"}
for _, u := range users {
println(u, "→", ring.GetNode(u))
}
// Add node with minimal remapping
ring.AddNode("db3", 50)
println("\nAfter adding db3:")
for _, u := range users {
println(u, "→", ring.GetNode(u)) // Only 1 key remapped
}
}Advantages:
- Scales horizontally: Add nodes with ~
1/ndata movement - Fault tolerance: Failed nodes bypassed automatically
- Load balancing: Weighted vnodes handle heterogeneous hardware
- Zero downtime migrations: Add new nodes before decommissioning old
Production considerations:
- Replication: Map keys to N clockwise nodes for redundancy
- Multi-region: Overlay location-aware rings
- SLOs: Monitor vnodes distribution for hotspots
- Hash choice: Non-cryptographic hashes (xxHash) for speed
- Data versioning: Required for migration atomicity
This approach reduces re-Sharding overhead from O(n) to O(1/n) while maintaining deterministic lookups. Virtual nodes prevent uneven load from skewed hash distributions