Consistent Hashing

When we have a lot of users, then we segregate the data into multiple databases as we can’t store the data in one database as we have hardware constraints
Now when we divide the data into multiple databases we also need to know which database to look into when we need to retrieve data of a particular user

1. Segregate via Id

We take Id of the user and divide it with number of servers(databases) and with respect to the remainder will send data into respective database

package main
 
import "fmt"
 
func getDatabaseIndex(userID int, numDatabases int) int {
	if numDatabases <= 0 {
		panic("Number of databases must be greater than 0")
	}
	return userID % numDatabases
}
 
func main() {
	numDatabases := 4
	userIDs := []int{101, 202, 303, 404, 505, 606}
	for _, userID := range userIDs {
		dbIndex := getDatabaseIndex(userID, numDatabases)
		fmt.Printf("User ID %d → Database %d\n", userID, dbIndex)
	}
}

All good but what if, the user load increases even more and we need to spin up another server(database) then this function will not work and we need to rehash the data and should move the data into new databases

2. Consistent Hashing

The modulo-based approach fails when adding/removing databases because most keys remap, requiring massive data migration. Consistent hashing solves this by minimizing remapping to only k/n keys (where k=keys, n=nodes)

How it works:

Databases and keys are hashed onto a ring (0-2³² range)
All the database nodes get hashed and get their spot in the ring
The data is now hashed and also get their spot in the ring, If there is database node in that spot they will go into that database otherwise they will go into the next database present in the clockwise direction
Virtual nodes (vnodes) balance load distribution
Adding/removing nodes only affects adjacent neighbors

Edge cases:

If a new node added - Move the data which is present between the nearest anti clockwise database to new node database into new node database
If a Database Node failed - auto-remap to next node and migrate the data into it
If the nodes came too close to each other via hashing, Increasing load on one database(Hotspots) - use vnodes(virtual nodes - create virtual nodes from the original node and route data to it) and distribute load
Node Flapping - Rapid join/leave cycles can cause churn; implement join/leave batching or backoff
Uneven node capacities - weighted vnodes
Hash collisions - rare, but can be handled by secondary hashing

package main
 
import (
	"crypto/sha1"
	"math"
	"sort"
	"sync"
)
 
type Ring struct {
	sync.RWMutex
	nodes    map[uint32]string  // hash → node
	vnodeMap map[string]int     // node → vnode count
	keys     []uint32           // sorted ring positions
}
 
// AddNode with configurable vnodes for capacity weighting
func (r *Ring) AddNode(node string, vnodes int) {
	r.Lock()
	defer r.Unlock()
	
	if r.nodes == nil {
		r.nodes = make(map[uint32]string)
		r.vnodeMap = make(map[string]int)
	}
	
	for i := 0; i < vnodes; i++ {
		vnode := node + "#" + string(i)
		hash := hash(vnode)
		
		// Handle hash collisions
		for _, exists := r.nodes[hash]; exists; {
			hash = rehash(hash)
		}
		
		r.nodes[hash] = node
		r.keys = append(r.keys, hash)
	}
	r.vnodeMap[node] = vnodes
	sort.Slice(r.keys, func(i, j int) bool {
		return r.keys[i] < r.keys[j]
	})
}
 
func (r *Ring) RemoveNode(node string) {
	r.Lock()
	defer r.Unlock()
	
	vnodes := r.vnodeMap[node]
	for i := 0; i < vnodes; i++ {
		vnode := node + "#" + string(i)
		hash := hash(vnode)
		delete(r.nodes, hash)
		
		// Remove from sorted keys
		idx := sort.Search(len(r.keys), func(i int) bool {
			return r.keys[i] >= hash
		})
		r.keys = append(r.keys[:idx], r.keys[idx+1:]...)
	}
	delete(r.vnodeMap, node)
}
 
func (r *Ring) GetNode(key string) string {
	r.RLock()
	defer r.RUnlock()
	
	if len(r.keys) == 0 {
		return ""
	}
	
	hash := hash(key)
	idx := sort.Search(len(r.keys), func(i int) bool {
		return r.keys[i] >= hash
	})
	
	// Handle ring wrap-around
	if idx == len(r.keys) {
		idx = 0
	}
	return r.nodes[r.keys[idx]]
}
 
// hash generates 32-bit position (SHA1 truncated)
func hash(s string) uint32 {
	h := sha1.Sum([]byte(s))
	return binary.BigEndian.Uint32(h[:4])
}
 
// rehash for collision resolution
func rehash(prev uint32) uint32 {
	return (prev * 11400714819323198485) % math.MaxUint32
}
 
func main() {
	ring := &Ring{}
	ring.AddNode("db0", 100) // Higher capacity = more vnodes
	ring.AddNode("db1", 75)
	ring.AddNode("db2", 50)
	
	// Keys map to nearest node
	users := []string{"userA", "userB", "userC"}
	for _, u := range users {
		println(u, "→", ring.GetNode(u))
	}
	
	// Add node with minimal remapping
	ring.AddNode("db3", 50)
	println("\nAfter adding db3:")
	for _, u := range users {
		println(u, "→", ring.GetNode(u)) // Only 1 key remapped
	}
}

Advantages:

Scales horizontally: Add nodes with ~ 1/n data movement
Fault tolerance: Failed nodes bypassed automatically
Load balancing: Weighted vnodes handle heterogeneous hardware
Zero downtime migrations: Add new nodes before decommissioning old

Production considerations:

Replication: Map keys to N clockwise nodes for redundancy
Multi-region: Overlay location-aware rings
SLOs: Monitor vnodes distribution for hotspots
Hash choice: Non-cryptographic hashes (xxHash) for speed
Data versioning: Required for migration atomicity

This approach reduces re-Sharding overhead from O(n) to O(1/n) while maintaining deterministic lookups. Virtual nodes prevent uneven load from skewed hash distributions

Sadiq's Knowledge Vaults

Explorer

Consistent Hashing

1. Segregate via Id

2. Consistent Hashing

Graph View