When we have a lot of users, then we segregate the data into multiple databases as we can’t store the data in one database as we have hardware constraints
Now when we divide the data into multiple databases we also need to know which database to look into when we need to retrieve data of a particular user

1. Segregate via Id

We take Id of the user and divide it with number of servers(databases) and with respect to the remainder will send data into respective database

package main
 
import "fmt"
 
func getDatabaseIndex(userID int, numDatabases int) int {
	if numDatabases <= 0 {
		panic("Number of databases must be greater than 0")
	}
	return userID % numDatabases
}
 
func main() {
	numDatabases := 4
	userIDs := []int{101, 202, 303, 404, 505, 606}
	for _, userID := range userIDs {
		dbIndex := getDatabaseIndex(userID, numDatabases)
		fmt.Printf("User ID %d → Database %d\n", userID, dbIndex)
	}
}

All good but what if, the user load increases even more and we need to spin up another server(database) then this function will not work and we need to rehash the data and should move the data into new databases

2. Consistent Hashing

The modulo-based approach fails when adding/removing databases because most keys remap, requiring massive data migration. Consistent hashing solves this by minimizing remapping to only k/n keys (where k=keys, n=nodes)

How it works:

  1. Databases and keys are hashed onto a ring (0-2³² range)
  2. All the database nodes get hashed and get their spot in the ring
  3. The data is now hashed and also get their spot in the ring, If there is database node in that spot they will go into that database otherwise they will go into the next database present in the clockwise direction
  4. Virtual nodes (vnodes) balance load distribution
  5. Adding/removing nodes only affects adjacent neighbors

Edge cases:

  • If a new node added - Move the data which is present between the nearest anti clockwise database to new node database into new node database
  • If a Database Node failed - auto-remap to next node and migrate the data into it
  • If the nodes came too close to each other via hashing, Increasing load on one database(Hotspots) - use vnodes(virtual nodes - create virtual nodes from the original node and route data to it) and distribute load
  • Node Flapping - Rapid join/leave cycles can cause churn; implement join/leave batching or backoff
  • Uneven node capacities - weighted vnodes
  • Hash collisions - rare, but can be handled by secondary hashing
package main
 
import (
	"crypto/sha1"
	"math"
	"sort"
	"sync"
)
 
type Ring struct {
	sync.RWMutex
	nodes    map[uint32]string  // hash → node
	vnodeMap map[string]int     // node → vnode count
	keys     []uint32           // sorted ring positions
}
 
// AddNode with configurable vnodes for capacity weighting
func (r *Ring) AddNode(node string, vnodes int) {
	r.Lock()
	defer r.Unlock()
	
	if r.nodes == nil {
		r.nodes = make(map[uint32]string)
		r.vnodeMap = make(map[string]int)
	}
	
	for i := 0; i < vnodes; i++ {
		vnode := node + "#" + string(i)
		hash := hash(vnode)
		
		// Handle hash collisions
		for _, exists := r.nodes[hash]; exists; {
			hash = rehash(hash)
		}
		
		r.nodes[hash] = node
		r.keys = append(r.keys, hash)
	}
	r.vnodeMap[node] = vnodes
	sort.Slice(r.keys, func(i, j int) bool {
		return r.keys[i] < r.keys[j]
	})
}
 
func (r *Ring) RemoveNode(node string) {
	r.Lock()
	defer r.Unlock()
	
	vnodes := r.vnodeMap[node]
	for i := 0; i < vnodes; i++ {
		vnode := node + "#" + string(i)
		hash := hash(vnode)
		delete(r.nodes, hash)
		
		// Remove from sorted keys
		idx := sort.Search(len(r.keys), func(i int) bool {
			return r.keys[i] >= hash
		})
		r.keys = append(r.keys[:idx], r.keys[idx+1:]...)
	}
	delete(r.vnodeMap, node)
}
 
func (r *Ring) GetNode(key string) string {
	r.RLock()
	defer r.RUnlock()
	
	if len(r.keys) == 0 {
		return ""
	}
	
	hash := hash(key)
	idx := sort.Search(len(r.keys), func(i int) bool {
		return r.keys[i] >= hash
	})
	
	// Handle ring wrap-around
	if idx == len(r.keys) {
		idx = 0
	}
	return r.nodes[r.keys[idx]]
}
 
// hash generates 32-bit position (SHA1 truncated)
func hash(s string) uint32 {
	h := sha1.Sum([]byte(s))
	return binary.BigEndian.Uint32(h[:4])
}
 
// rehash for collision resolution
func rehash(prev uint32) uint32 {
	return (prev * 11400714819323198485) % math.MaxUint32
}
 
func main() {
	ring := &Ring{}
	ring.AddNode("db0", 100) // Higher capacity = more vnodes
	ring.AddNode("db1", 75)
	ring.AddNode("db2", 50)
	
	// Keys map to nearest node
	users := []string{"userA", "userB", "userC"}
	for _, u := range users {
		println(u, "→", ring.GetNode(u))
	}
	
	// Add node with minimal remapping
	ring.AddNode("db3", 50)
	println("\nAfter adding db3:")
	for _, u := range users {
		println(u, "→", ring.GetNode(u)) // Only 1 key remapped
	}
}

Advantages:

  • Scales horizontally: Add nodes with ~ 1/n data movement
  • Fault tolerance: Failed nodes bypassed automatically
  • Load balancing: Weighted vnodes handle heterogeneous hardware
  • Zero downtime migrations: Add new nodes before decommissioning old

Production considerations:

  • Replication: Map keys to N clockwise nodes for redundancy
  • Multi-region: Overlay location-aware rings
  • SLOs: Monitor vnodes distribution for hotspots
  • Hash choice: Non-cryptographic hashes (xxHash) for speed
  • Data versioning: Required for migration atomicity

This approach reduces re-Sharding overhead from O(n) to O(1/n) while maintaining deterministic lookups. Virtual nodes prevent uneven load from skewed hash distributions