ADR-009: Trace artifact format and dispatch-observer seam
Status: Accepted (capture + token report + replay + token budget + offline viewer all shipped)
Context
The server can drive an app but cannot record what it did. An agent (or a human debugging one) has no portable record of which tools ran, in what order, with what inputs and outputs, or where the token budget went. The trace capability is the first product differentiator and the first genuine consumer of the plugin model (ADR-004): it must observe the WHOLE session, not just add isolated tools. Two new decisions are needed — how a plugin observes every tool call, and what the recorded artifact looks like.
Decision
1. A dispatch-observer seam on the dispatcher
The dispatcher (ADR-008) gains a set of best-effort observers. After every dispatch()
resolves — success, validation failure, or unknown tool — it notifies each observer exactly
once, through a single internal funnel, with a DispatchRecord:
{ tool, args, result, startedAt, finishedAt }
args is the parsed input (or the raw input when validation failed); result is the exact
agent-facing envelope (so estimated_tokens is read from result._meta). Observers register
via Dispatcher.addObserver(observer): () => void (returns an unsubscribe), also surfaced to
tool handlers as ToolContext.addDispatchObserver so a plugin tool (e.g. trace_start) can
attach a sink without the core depending on the plugin.
Observers are best-effort and must not throw: a throw is caught and logged, never
propagated to the agent (same contract as the transport's console/dialog listeners). With zero
observers the funnel is a single size > 0 check — no per-call overhead.
2. The trace artifact is JSONL, schema version 1
One JSON record per line. The first line is a meta header; each subsequent line is a call:
{ "v": 1, "kind": "meta", "started_at", "core_version" }
{ "kind": "call", "tool", "ok", "code"?, "started_at", "finished_at", "elapsed_ms",
"estimated_tokens", "args", "result" }
JSONL was chosen over zstd/protobuf/OpenTelemetry: it is append-friendly, human-readable, and
parses with JSON.parse per line — no dependency, no schema compiler. The explicit v lets
replay, budget metadata, and viewer features evolve the shape compatibly.
3. Trace is a plugin, recording is opt-in
Per the lean-core thesis (ADR-007), trace ships as @electron-stagewright/plugin-trace, not
core. It records only between trace_start and trace_stop, to an operator-chosen path, and
skips its own trace_* calls. Records are buffered in memory (bounded by maxRecords) and
written on stop, so the observer stays a cheap array push on the dispatch hot path; a crash
before stop loses the buffer (streaming is a forthcoming improvement).
Rationale
- A core observer seam is the only way a plugin can see other tools' calls; making it a plugin keeps the lean-core tool surface intact while proving plugins can be session-wide.
- Recording all outcomes (including errors) makes a trace a faithful session record.
- Token reporting reads
_meta.estimated_tokens, already computed per envelope (ADR-006), so the report needs no new measurement.
Alternatives considered
- Wrap each tool handler instead of an observer — plugins only add tools; they cannot wrap core tools, and a wrapper would not see validation failures (which fail before the handler).
- Stream each record to disk — more crash-durable but blocks the dispatch path or needs an ordered async queue; deferred behind the in-memory buffer + cap.
- Binary/OTel artifact — heavier dependency and tooling than the debugging use case needs.
Consequences
ToolContextgainsaddDispatchObserver;DispatchRecord/DispatchObserverare public.- A trace captures inputs/outputs that may contain sensitive data (typed text, eval payloads).
It is opt-in and operator-pathed (same trust model as screenshots/logs); a
redactconfig drops named argument fields. Broader redaction/sandboxing is a forthcoming follow-up. - The initial plugin surface ships
trace_start,trace_stop,trace_tokens,trace_status. Replay, budget reporting, and the visual viewer extend that surface compatibly.
Related decisions
- ADR-004 (plugin model) — the contract the trace plugin is built on.
- ADR-006 (error code registry) —
estimated_tokensand the namespacedtrace.*error codes. - ADR-008 (server and tool dispatcher) — the dispatch path the observer seam funnels through.
References
packages/core/src/server/dispatcher.ts— observer set,addObserver, the#completefunnel.packages/core/src/tools/types.ts—DispatchRecord,DispatchObserver, the ctx seam.packages/plugin-trace/— the plugin, recorder (JSONL + token summary), and tests.
Status Update (2026-06-04) — the seam becomes bidirectional; trace_replay
The original seam made plugins OBSERVE every dispatch. Replay needs the inverse — to RE-DISPATCH recorded calls — so the seam gains an active half:
ToolContext.dispatch(tool, args): Promise<ToolResult>re-dispatches a tool through the same dispatcher (full Zod validation, operation-type routing, session context, observer notification), andToolContext.validate(tool, args): ErrorResponse | nullchecks a call against the current schema WITHOUT running the handler. Both are wired by the dispatcher; the core still does not depend on the plugin.- Bounded recursion. Re-dispatch depth is tracked in an
AsyncLocalStorage<number>(per async chain, so concurrent dispatches never share a counter); pastMAX_REDISPATCH_DEPTH = 1a re-dispatch returnsBAD_ARGUMENTinstead of recursing. A tool may drive other tools (replay at depth 0, its calls at depth 1); a re-dispatched tool may not re-dispatch again. trace_replayreads an artifact and re-dispatches eachcallin order. The crux is session-id remapping: the recorded run's session ids are defunct, so the engine keeps arecorded → newmap (learned from each result's_meta.session_id) and rewrites each call'ssessionIdarg before dispatch. Divergence is judged on the STABLE outcome (ok+ errorcode), not the full envelope (payloads vary run-to-run); each diverged call carries a bounded, size-capped field-level diff.dryRunre-validates against current schemas without dispatching (detects a tool whose signature drifted since recording). A trace recorded withredactcannot be faithfully replayed — redacted args diverge — which the tool reports rather than hides.- The "replay a deterministic minimal-app session" acceptance criterion lands here, proven by a gated real-Electron smoke (record launch→snapshot→click→stop, then replay with all matched).
- Still forthcoming at this point in the decision history: the visual offline viewer and token-budget enforcement.
New references: packages/plugin-trace/src/replay.ts (engine); ToolContext.dispatch / validate
and the REDISPATCH_DEPTH guard in dispatcher.ts.
Status Update (2026-06-04) — token budget + a pre-dispatch guard seam
The trace plugin gains a token budget, and the dispatch seam gains a third capability — a pre-dispatch VETO — alongside observe and re-dispatch:
ToolContext.addDispatchGuard(guard)registers aDispatchGuardthat runs for every dispatch BEFORE the handler (after validation + operation-type routing). Returning aToolResultvetoes the call (handler never runs; the envelope is returned and observers still see it via#complete); returningnullallows it. Guards run in registration order, first veto wins, and a throwing guard is caught, logged, and treated as allow (fail-open — a guard bug must never wedge the tool surface). This is the active-enforcement counterpart to the observe-only seam.- Budget.
trace_startacceptsbudgetTokens(+warnThreshold, default 0.8); the recorder tracks exactspent— counted BEFORE the record-cap drop, so an overflowed trace still reports the true total — and exposesbudgetStatus()({ budget_tokens, spent, remaining, over_budget, near_budget, warn_threshold }). The budget (and exact spent) persist in the JSONLmetaheader (additive optional fields; format version stays 1).trace_status,trace_tokens, the newtrace_budget, andtrace_stopall surface the nestedbudgetobject. - Enforcement is opt-in. By default the budget is advisory (report only; the agent polls
trace_budgetand self-limits).trace_start({ enforce: true })(or theenforceBudgetconfig) registers a guard that vetoes over-budget calls with the newtrace.BUDGET_EXCEEDED(http 429, not retryable, carriesnext_actions). The guard skipstrace_*so an over-budget agent can alwaystrace_stopto recover; the call that tips the budget over still runs (its cost is unknown beforehand), everything after it is blocked. The guard is released on stop + teardown. - Still forthcoming: the visual offline viewer.
New references: DispatchGuard / DispatchGuardCall / ToolContext.addDispatchGuard and the
#guards set in dispatcher.ts; BudgetStatus / budgetStatusOf and budget tracking in
packages/plugin-trace/src/recorder.ts.
Status Update (2026-06-06) — the offline viewer
The last forthcoming piece — a visual viewer — ships as trace_view. The format decision the
original scope left open ("viewer format chosen before implementation") is resolved as a
single self-contained HTML document: inline CSS and JS, no external assets, no CDN, no
runtime server, no build step. The operator opens it by double-clicking; it works fully offline
and is trivial to attach to a bug report. This was chosen over a hosted/served viewer (needs a
process and a port), a framework SPA (build step + bundle), or a terminal viewer (not shareable)
because a trace is a portable record — the viewer should be as portable as the artifact.
trace_view({ path, out? })reads a written artifact (the sharedloadTracehelper, so a missing file →trace.ARTIFACT_NOT_FOUND, malformed JSONL →trace.ARTIFACT_INVALID), renders it, and writes the HTML (a write failure → the existingtrace.ARTIFACT_WRITE_FAILED). With noout, the report is written next to the trace with a.htmlextension. The render is a pure function (renderTraceHtml(parsed, { generatedAt? })), so it is unit-tested without disk I/O, and the generation timestamp is injected for deterministic output.- The report shows summary cards (calls, ok/error, total estimated tokens), a token-budget bar
when the trace carries a budget, the largest-response and per-tool token tables (reusing
summarizeTrace), and an expandable timeline of every call (native<details>, so it is readable with JS disabled). A small fixed inline script adds tool-name filtering and expand/collapse-all as progressive enhancement. - SECURITY. A trace captures arbitrary tool inputs/outputs (typed text, eval payloads, app
content), so every dynamic value is HTML-escaped on render (text AND attribute values). The
inline script is a fixed constant with no trace data interpolated into it, so it adds no
injection surface. A unit test asserts a captured
<img onerror=…>/</script>payload is escaped rather than emitted as live markup.
New references: packages/plugin-trace/src/viewer.ts (renderTraceHtml / escapeHtml); the
trace_view tool in packages/plugin-trace/src/index.ts.