Cluster scaling: partitioned commands, competing projections, rebalancing #26

Open
opened 2026-02-19 22:24:41 +00:00 by ash · 0 comments
Owner

What

eskit must work at cluster scale — multiple instances, partitioned processing, zero downtime scaling.

Scaling Dimensions

1. Command Processing (Single Writer at Scale)

  • NATS queue groups per entity type
  • MaxAckPending=1 per subject for serialized processing
  • Node dies → NATS redelivers. Automatic failover.
  • Consistent hashing for sticky routing
  • DistributedCommandBus wrapping NATS queue consumers

2. Projection Processing (Competing Consumers)

  • Durable pull consumers with queue groups
  • Each instance fetches batches, processes, acks
  • PartitionedSubscription — by tenant, hash, or subject filter

3. State Loading Optimization

  • Snapshots to avoid full replay on partition reassignment
  • Lazy loading on first command
  • Pre-warming when partition assignment known
  • NATS KV as distributed snapshot cache

4. Hot Streams

  • Aggressive snapshotting
  • Backpressure: reject/queue when overwhelmed
  • Metrics: detect and alert on hot streams

5. Cross-Region (Future)

  • NATS superclusters
  • Region-local writes, async replication
  • Read replicas per region

Priority

Medium — design now, implement after core is solid. Core abstractions must be clustering-aware from day one.

## What eskit must work at cluster scale — multiple instances, partitioned processing, zero downtime scaling. ## Scaling Dimensions ### 1. Command Processing (Single Writer at Scale) - NATS queue groups per entity type - MaxAckPending=1 per subject for serialized processing - Node dies → NATS redelivers. Automatic failover. - Consistent hashing for sticky routing - DistributedCommandBus wrapping NATS queue consumers ### 2. Projection Processing (Competing Consumers) - Durable pull consumers with queue groups - Each instance fetches batches, processes, acks - PartitionedSubscription — by tenant, hash, or subject filter ### 3. State Loading Optimization - Snapshots to avoid full replay on partition reassignment - Lazy loading on first command - Pre-warming when partition assignment known - NATS KV as distributed snapshot cache ### 4. Hot Streams - Aggressive snapshotting - Backpressure: reject/queue when overwhelmed - Metrics: detect and alert on hot streams ### 5. Cross-Region (Future) - NATS superclusters - Region-local writes, async replication - Read replicas per region ## Priority Medium — design now, implement after core is solid. Core abstractions must be clustering-aware from day one.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ash/eskit#26
No description provided.