perf: reduce per-event allocations in sqlitestore and pgstore Load paths #144

Closed
opened 2026-03-01 11:41:46 +00:00 by ash · 0 comments
Owner

Problem

Benchmark shows 45 allocations per event on Load for sqlitestore:

ReadStream_1000: 6.3ms, 1.18MB, 45,433 allocs (45/event)
ReadStream_100:  0.88ms, 118KB, 4,341 allocs (43/event)
ReadStream_10:   0.13ms, 13KB, 469 allocs (47/event)

For the #1 Go event sourcing framework, this is unacceptable. Target: <10 allocs/event.

Root Causes

  1. String allocations per column — SQL driver allocates a new string for every TEXT column per row (stream_id, event_type, codec, timestamp, correlation_id, causation_id, metadata_extra). That is 7 string allocs per event minimum.
  2. string(buf.idBuf) for ID — converts byte slice to string (1 alloc)
  3. time.Parse(RFC3339Nano, ts) — parses timestamp string (1 alloc)
  4. DeserializeWithUpcastingCodec — unmarshal + type assertion (2+ allocs)
  5. json.Unmarshal for metadata.Extra — when non-empty (1 alloc)
  6. Event struct value copies — appending eskit.Event[E] to slice

Proposed Fixes

Quick wins (no schema change)

  • Use sql.RawBytes for TEXT columns instead of string — avoids driver allocation, reuses row buffer. Copy only what we need.
  • Pre-size metadata.Extra map — avoid growth allocations
  • Use unsafe.String for zero-copy string creation where safe (same goroutine, before Next() call)
  • Pool the intermediate byte buffers for deserialization
  • Avoid time.Parse — use time.Unix if we can store as integer. If not, at least cache the time.Location.

Medium effort (schema migration)

  • Store timestamp as INTEGER (unix nanos) instead of TEXT — eliminates time.Parse entirely
  • Store global ID as the event ID directly instead of converting int64 → string

Architecture

  • Benchmark pgstore same path — likely same allocation profile
  • Profile with go test -cpuprofile and go test -memprofile to find exact hot spots
  • Set target: <10 allocs/event for Load, <5 allocs/event for Append

Requirements

  • Profile before and after with -memprofile
  • Benchmark showing allocs/event reduction
  • No API changes (internal optimization only)
  • Must pass all conformance tests
  • Both sqlitestore and pgstore
## Problem Benchmark shows **45 allocations per event** on Load for sqlitestore: ``` ReadStream_1000: 6.3ms, 1.18MB, 45,433 allocs (45/event) ReadStream_100: 0.88ms, 118KB, 4,341 allocs (43/event) ReadStream_10: 0.13ms, 13KB, 469 allocs (47/event) ``` For the #1 Go event sourcing framework, this is unacceptable. Target: <10 allocs/event. ## Root Causes 1. **String allocations per column** — SQL driver allocates a new string for every TEXT column per row (stream_id, event_type, codec, timestamp, correlation_id, causation_id, metadata_extra). That is 7 string allocs per event minimum. 2. **`string(buf.idBuf)` for ID** — converts byte slice to string (1 alloc) 3. **`time.Parse(RFC3339Nano, ts)`** — parses timestamp string (1 alloc) 4. **`DeserializeWithUpcastingCodec`** — unmarshal + type assertion (2+ allocs) 5. **`json.Unmarshal` for metadata.Extra** — when non-empty (1 alloc) 6. **Event struct value copies** — appending `eskit.Event[E]` to slice ## Proposed Fixes ### Quick wins (no schema change) - [ ] Use `sql.RawBytes` for TEXT columns instead of `string` — avoids driver allocation, reuses row buffer. Copy only what we need. - [ ] Pre-size metadata.Extra map — avoid growth allocations - [ ] Use `unsafe.String` for zero-copy string creation where safe (same goroutine, before Next() call) - [ ] Pool the intermediate byte buffers for deserialization - [ ] Avoid `time.Parse` — use `time.Unix` if we can store as integer. If not, at least cache the time.Location. ### Medium effort (schema migration) - [ ] Store timestamp as INTEGER (unix nanos) instead of TEXT — eliminates time.Parse entirely - [ ] Store global ID as the event ID directly instead of converting int64 → string ### Architecture - [ ] Benchmark pgstore same path — likely same allocation profile - [ ] Profile with `go test -cpuprofile` and `go test -memprofile` to find exact hot spots - [ ] Set target: <10 allocs/event for Load, <5 allocs/event for Append ## Requirements - Profile before and after with -memprofile - Benchmark showing allocs/event reduction - No API changes (internal optimization only) - Must pass all conformance tests - Both sqlitestore and pgstore
ash closed this issue 2026-03-01 12:03:49 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ash/eskit#144
No description provided.