Cluster scaling: partitioned commands, competing projections, rebalancing #26

New issue

Open

opened 2026-02-19 22:24:41 +00:00 by ash · 0 comments

ash commented

2026-02-19 22:24:41 +00:00

Owner

What

eskit must work at cluster scale — multiple instances, partitioned processing, zero downtime scaling.

Scaling Dimensions

1. Command Processing (Single Writer at Scale)

NATS queue groups per entity type
MaxAckPending=1 per subject for serialized processing
Node dies → NATS redelivers. Automatic failover.
Consistent hashing for sticky routing
DistributedCommandBus wrapping NATS queue consumers

2. Projection Processing (Competing Consumers)

Durable pull consumers with queue groups
Each instance fetches batches, processes, acks
PartitionedSubscription — by tenant, hash, or subject filter

3. State Loading Optimization

Snapshots to avoid full replay on partition reassignment
Lazy loading on first command
Pre-warming when partition assignment known
NATS KV as distributed snapshot cache

4. Hot Streams

Aggressive snapshotting
Backpressure: reject/queue when overwhelmed
Metrics: detect and alert on hot streams

5. Cross-Region (Future)

NATS superclusters
Region-local writes, async replication
Read replicas per region

Priority

Medium — design now, implement after core is solid. Core abstractions must be clustering-aware from day one.

## What eskit must work at cluster scale — multiple instances, partitioned processing, zero downtime scaling. ## Scaling Dimensions ### 1. Command Processing (Single Writer at Scale) - NATS queue groups per entity type - MaxAckPending=1 per subject for serialized processing - Node dies → NATS redelivers. Automatic failover. - Consistent hashing for sticky routing - DistributedCommandBus wrapping NATS queue consumers ### 2. Projection Processing (Competing Consumers) - Durable pull consumers with queue groups - Each instance fetches batches, processes, acks - PartitionedSubscription — by tenant, hash, or subject filter ### 3. State Loading Optimization - Snapshots to avoid full replay on partition reassignment - Lazy loading on first command - Pre-warming when partition assignment known - NATS KV as distributed snapshot cache ### 4. Hot Streams - Aggressive snapshotting - Backpressure: reject/queue when overwhelmed - Metrics: detect and alert on hot streams ### 5. Cross-Region (Future) - NATS superclusters - Region-local writes, async replication - Read replicas per region ## Priority Medium — design now, implement after core is solid. Core abstractions must be clustering-aware from day one.