Electron Stagewright docs

Concepts — how Electron Stagewright works, and why

This is the explanation layer of the docs: the model behind the tools, and the reasoning behind it. If you want to do something, start with the guides; if you want the exact contract of a tool, see the generated tool reference. This page is for understanding why the server is shaped the way it is. The decisions referenced here are recorded as ADRs.

The throughline: built for an agent, not a human

Most desktop-automation tools assume a human is watching: they throw stack traces, return raw values, and expect you to read the screen between steps. Electron Stagewright assumes the caller is an LLM agent that has to decide its next move from the result alone — no screen, no prior context beyond what the tool returns. Every design choice below follows from that. The principles are recorded in ADR-007.

The response envelope

Every tool returns a JSON object discriminated by ok. On success it carries ok: true plus the tool's result fields; on failure it carries ok: false with a stable code, a human-readable error, a hint, a retryable flag, an HTTP-equivalent http status, and often next_actions (concrete tool calls to try next) and similar_refs (candidates when a handle missed). A _meta block adds estimated_tokens and elapsed_ms. See the root README for a worked example.

Branch on code, never on the prose error. The codes are a closed registry — they do not change wording out from under you — so an agent can switch on them reliably; the error string is for a human reading a log. This registry-plus- envelope design is ADR-006. The point is that a failure is actionable: the agent learns what went wrong (code), whether retrying could help (retryable), and what to do instead (next_actions) without asking for more context.

Addressing elements: refs vs selectors

You can target an element two ways: a ref (a small integer handle from a snapshot) or a selector (CSS or a role/name query). Prefer refs. A ref is reconciled across snapshots by the element's fingerprint; if several elements share the same fingerprint, they are paired in document order. Within one renderer session, that means the same logical button keeps its ref across ordinary DOM re-renders when its fingerprint and relative duplicate order stay stable. A renderer reload or route change is the exception — it invalidates the stored ref map, so the server forces a fresh snapshot and flags renderer_reloaded, the signal to re-read before acting. Selectors are fine for stable, well-known elements, but a positional or brittle selector breaks the moment the UI shifts. The snapshot schema, fingerprint reconciliation, and the reload signal are ADR-005.

Snapshots and diffs

electron_snapshot returns an accessibility-tree view of the renderer — roles, names, states, and refs — rather than raw HTML, because that is the level an agent reasons at. Because a full tree is large, the server can return a compact diff since the last snapshot (what changed) instead of the whole tree again, keeping the agent's token budget under control. electron_find narrows to the element you mean by role and name. Same decision record: ADR-005.

Assertions that retry: the expect family

A naive check is a loop — read state, compare, wait, read again — and each turn of that loop is a tool call and tokens. The electron_expect_* family collapses it into one call: you state the condition you expect (text, value, visibility, count, URL, state), and the server polls until it holds or the timeout elapses, returning a single matched: true or an EXPECTATION_FAILED envelope. One call instead of a read-compare-retry chain. The expectation codes live in the same registry, ADR-006; for the how-to, see Assert UI state.

Sessions and transports

A session is one running app the server is driving; you get one from electron_launch (start it), electron_attach (connect to a running one), or electron_inject (a Node-Inspector handshake into an existing process), and you end it with electron_stop. Each session is produced by a transport behind a single ITransport interface, and each transport advertises its capabilities — whether it can eval in the main process, intercept, control the clock, and so on — so a tool whose capability the transport lacks fails honestly with a capability error (TRANSPORT_UNSUPPORTED when the matrix rules it out, NOT_IMPLEMENTED when a transport claims the capability but defers the body) rather than silently doing nothing. The transport abstraction is ADR-003; see Launch, attach, or inject for choosing one.

Eval and plugins: power, gated

Two pieces are deliberately kept behind explicit opt-ins:

Glossary