perf: reduce Append allocations — 76 allocs for 1 event is unacceptable #150

Closed
opened 2026-03-01 12:11:52 +00:00 by ash · 0 comments
Owner

Problem

Benchmarked on real hardware:

SQLite AppendSingle: 247µs, 76 allocs
SQLite AppendBatch100: 3.8ms, 2154 allocs (22/event)
Postgres AppendSingle: 3.1ms, 61 allocs
Postgres AppendBatch100: 26ms, 2142 allocs (21/event)

76 allocations to append one event. For the #1 ES framework in Go, target is <10.

Root Causes (sqlitestore)

Per event in the loop:

  1. writeCodec.Marshal(data) — JSON serialize (unavoidable, but could pool buffer)
  2. json.Marshal(meta.Extra) — EVERY event, even when nil. Same metadata for all events in batch.
  3. now.Format(time.RFC3339Nano) — EVERY event, same timestamp. Format once outside loop.
  4. string(jsonData) — byte→string copy for SQL
  5. string(extraJSON) — byte→string copy for SQL
  6. stmt.ExecContext per row — N separate INSERTs instead of batch
  7. result.LastInsertId() — extra call per row
  8. strconv.FormatInt — int→string for Event.ID
  9. eskit.ResolveEventType — reflect per event (could cache)
  10. eskit.CurrentSchemaVersion — lookup per event

Outside loop:
11. Tombstone check query — EVERY append, even new streams
12. Version check query — separate from tombstone
13. PrepareContext — for INSERT

Fixes

Quick wins (no API change)

  • Format timestamp ONCE before loop, reuse string
  • Marshal metadata.Extra ONCE before loop (same for all events)
  • Cache ResolveEventType result per Go type (its the same type in a typed store)
  • Cache CurrentSchemaVersion (same event type = same version)
  • Combine tombstone + version check into ONE query: SELECT COALESCE(MAX(version),0), (SELECT reason FROM tombstones WHERE stream_id=?) FROM events WHERE stream_id=?

Medium effort

  • Batch INSERT: build one SQL statement for all events instead of N ExecContext calls
    • SQLite: INSERT INTO events VALUES (?,?,?),(?,?,?),(?,?,?)
    • Postgres: INSERT INTO events VALUES ($1,$2,$3),($4,$5,$6) or pgx CopyFrom
  • Use pgx CopyFrom for Postgres batch appends (binary protocol, zero parsing overhead)
  • Pool marshal buffers via sync.Pool

Stretch

  • Store timestamps as INTEGER (unix nanos) — eliminates Format entirely
  • Avoid string([]byte) copies — use driver that accepts []byte directly

Target

  • AppendSingle: <10 allocs
  • AppendBatch100: <8 allocs/event

Requirements

  • Benchmark BEFORE and AFTER on both SQLite and Postgres (VPS: 143.14.50.71)
  • Must pass all conformance tests
  • Fix pgstore benchmark factory to Truncate on setup (currently broken)
  • Both stores
## Problem Benchmarked on real hardware: ``` SQLite AppendSingle: 247µs, 76 allocs SQLite AppendBatch100: 3.8ms, 2154 allocs (22/event) Postgres AppendSingle: 3.1ms, 61 allocs Postgres AppendBatch100: 26ms, 2142 allocs (21/event) ``` 76 allocations to append one event. For the #1 ES framework in Go, target is <10. ## Root Causes (sqlitestore) Per event in the loop: 1. `writeCodec.Marshal(data)` — JSON serialize (unavoidable, but could pool buffer) 2. `json.Marshal(meta.Extra)` — EVERY event, even when nil. Same metadata for all events in batch. 3. `now.Format(time.RFC3339Nano)` — EVERY event, same timestamp. Format once outside loop. 4. `string(jsonData)` — byte→string copy for SQL 5. `string(extraJSON)` — byte→string copy for SQL 6. `stmt.ExecContext` per row — N separate INSERTs instead of batch 7. `result.LastInsertId()` — extra call per row 8. `strconv.FormatInt` — int→string for Event.ID 9. `eskit.ResolveEventType` — reflect per event (could cache) 10. `eskit.CurrentSchemaVersion` — lookup per event Outside loop: 11. Tombstone check query — EVERY append, even new streams 12. Version check query — separate from tombstone 13. PrepareContext — for INSERT ## Fixes ### Quick wins (no API change) - [ ] Format timestamp ONCE before loop, reuse string - [ ] Marshal metadata.Extra ONCE before loop (same for all events) - [ ] Cache ResolveEventType result per Go type (its the same type in a typed store) - [ ] Cache CurrentSchemaVersion (same event type = same version) - [ ] Combine tombstone + version check into ONE query: `SELECT COALESCE(MAX(version),0), (SELECT reason FROM tombstones WHERE stream_id=?) FROM events WHERE stream_id=?` ### Medium effort - [ ] Batch INSERT: build one SQL statement for all events instead of N ExecContext calls - SQLite: `INSERT INTO events VALUES (?,?,?),(?,?,?),(?,?,?)` - Postgres: `INSERT INTO events VALUES ($1,$2,$3),($4,$5,$6)` or pgx CopyFrom - [ ] Use pgx CopyFrom for Postgres batch appends (binary protocol, zero parsing overhead) - [ ] Pool marshal buffers via sync.Pool ### Stretch - [ ] Store timestamps as INTEGER (unix nanos) — eliminates Format entirely - [ ] Avoid string([]byte) copies — use driver that accepts []byte directly ## Target - AppendSingle: <10 allocs - AppendBatch100: <8 allocs/event ## Requirements - Benchmark BEFORE and AFTER on both SQLite and Postgres (VPS: 143.14.50.71) - Must pass all conformance tests - Fix pgstore benchmark factory to Truncate on setup (currently broken) - Both stores
ash closed this issue 2026-03-01 12:16:55 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ash/eskit#150
No description provided.