TypeAgent Studio — real replay path + Impact Report#2509
Conversation
Merge the human-driven and AI-driven walkthroughs into one narrative where Aida delegates the tune -> find-regressions -> validate loop to an AI agent, then finishes the utterances it left uncovered herself. Soften API-level jargon to describe the logic, make the authoring prompt language-neutral, and note that the collision scan catches cross-agent overlaps, not just intra-agent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…te overlay Add DESIGN.md 3.6 capturing what a sandbox is and where it's going: - composition (and therefore collisions) is intrinsically per-sandbox; - source is today a filtered view over shared repo working-tree, evolving to a per-sandbox copy-on-write overlay (a full debugging experience) via the loader's ordered agent-roots seam; - UI uses a single active-sandbox selector that scopes Corpora/Collisions/Event Log (visibility now, mutation once overlays land). Cross-reference the Sandbox primitive (3.2) and tree-views note (3.4), and add two STATUS.md next slices: the active-sandbox selector and the isolated overlay. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add P-7 to the implementation plan, deliberately after the single-sandbox end-to-end headline (P-3/J4, Gates A-E) closes: - P-7a: active-sandbox selector + per-sandbox scoping of Corpora/Collisions/ Event Log (visibility/analysis only); - P-7b: per-sandbox copy-on-write overlay via the loader's ordered agentRoots seam (mutation-local; overlay-vs-base replay; create-from-base -> tune -> promote/discard lifecycle). Add open decisions D11 (overlay substrate) and D12 (promote semantics), a source-isolation note under the sandbox open decisions (3.4), and tag the matching STATUS next-slices with their P-7 phases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…anation path)
Replace the identity replay stub with a resolver that evaluates each corpus
utterance against two grammar versions via the action-grammar NFA engine, so a
genuine grammar edit produces a genuine actionA != actionB row in the Impact
Report.
This is the first slice of the F4.1 'real replay path' (the P-3 long pole),
implementing the deterministic needs-explanation policy:
- createGrammarReplayResolver: per-run resolver that compiles each version's
.agr to an NFA (memoized per side) and matches utterances; normalizes matches
to {schemaName, actionName, parameters?} and canonicalizes empty parameters so
actionsEqual is correct.
- Version materialization via 'git show <ref>:<path>' (working-tree read for
workingTree) - stateless, no worktree/checkout to clean up.
- prepare() builds both versions up front so a build failure aborts the run
cleanly (ReplayVersionBuildError -> run-level error) instead of throwing
mid-stream and hanging the engine's row channel.
- resolveGrammarReplayTarget: single-.agr agents only (safe first-slice scope).
- studioRuntimeCore auto-engages the grammar resolver when no resolver is
injected + needs-explanation + repoRoot + a single-grammar agent; otherwise
falls back to identity. Build failures return a new optional StudioReplayResult.error.
Scope is deliberately 'static grammar replay', not full dispatcher fidelity:
it does not consult the construction cache or run validation. Full-dispatch
resolution is a later slice behind the same ReplayActionResolver interface.
Tests: unit (normalizeGrammarAction, working-tree match/miss, build-failure) +
git integration (working-tree vs HEAD divergence -> lost-match row through
replayCorpus). 146/146 typeagent-core tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eport The Impact Report now labels static-grammar replays as indicative (not authoritative full-dispatch) and renders a run-level version-build error instead of a misleading zero-row success. - studioRuntimeCore: add StudioReplayMethod and a required method field on StudioReplayResult; set it to static-grammar when the grammar resolver engages (and on aborted results), else identity. - replayViewModel: toImpactMethodNote (caveat banner) + toImpactErrorLine. - Impact Report webview: render a note/error banner; CSS for both. - Tests for the two new view-model helpers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Impact Report now drives a real two-version compare. Base (A) and Compare (B) text fields accept a git ref or the working-tree keyword and default to HEAD -> working tree (the find-a-regression journey). The host maps them to VersionSpecs and passes them to replayCorpus instead of the hard-coded working-tree self-compare, so an uncommitted grammar edit now produces a genuine action delta. - protocol: run message carries versionA/versionB strings (validated). - replayViewModel: parseVersionInput, describeVersion, toImpactComparisonLine. - webview: labelled version inputs (persisted in panel state, Enter to run) + a Comparing A -> B line; CSS for the fields and line. - impactReportView host: parseVersionInput -> replayCorpus versions. - Tests for the new helpers and the extended run-message parsing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ks packaged Compiling a grammar that uses built-in entities (Ordinal/Cardinal — the player grammar does) reads builtInEntities.agr from disk next to the bundle. The packaged extension didn't ship it, so the Impact Report's static-grammar replay threw ENOENT (version A/B failed to build). esbuild now copies the asset into dist/ next to both the extension and service bundles, where action-grammar's loader looks first. .vscodeignore keeps dist/, so it ships. Adds a build-guard test that the asset lands in dist/. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add the first Impact Report UX-overhaul slice (F4.3 pane 1 + the state-loss fix): - Context header band (repo, agent, method, fidelity, sandbox, policy) via a new browser-neutral toImpactHeaderFields helper. Honestly labels the static-grammar path as fidelity: indicative and sandbox: not used (it reads grammar from git / the working tree and is not sandbox-bound), so results aren't read as full dispatch. - Hybrid durable state so the report survives navigate-away/reload: the webview persists the row-capped last result and re-renders it on load, and the host re-pushes the full result on 'ready' to recover a run that finished while the iframe was torn down (panel is retainContextWhenHidden: false). Request-id dedupe accepts both the matching run and a recovery re-push. - Tooltips on all controls; init now carries the repo name. Folds the UX overhaul (U1-U4) into the implementation plan under P-3 J4 F4.2/F4.3. studio tests 143 pass; build + prettier clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
UI/UX fixes: - Impact Report: reset output regions on render so navigate-away/back no longer double-renders the result. - Corpora: FileSystemWatcher on in-repo *.utterances.jsonl auto-refreshes the tree; clearer clickable seed-corpus affordance (new-file icon + 'click to create' text). - Discoverability: Open Impact Report button on the Corpora view title. - Connection-aware empty states: Corpora/EventLog/Collisions providers return empty when disconnected (mirroring Sandboxes); a typeagentStudio.serviceConnected context key drives viewsWelcome 'Connect to Studio service' content for all four views. Packaging: - Allow the vsce 'privatekey' secret false-positive (bundled crypto dep template literal); ignore stray svc-out.txt. Shared WS liveness heartbeat: - Extract attachClientHeartbeat (client-direction ping/pong watchdog) alongside attachHeartbeat into @typeagent/websocket-utils/heartbeat, the cross-cutting package shared across agents. websocket-channel-server re-exports both for back-compat (code agent / agentServer untouched). - Studio's service client now uses attachClientHeartbeat to detect a silently-dropped service (half-open socket that never emits close) and synthesize the close that drives the existing disconnect+backoff reconnect path, instead of a private copy. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… L1) Enrich an agent's compiled grammar with checked-variable metadata derived from its action schema and match utterances through the real GrammarStore, giving a higher-fidelity replay than static-grammar matching. - grammarResolver: discover the agent's schema (originalSchemaFile + schemaType.action), project checked_wildcard/param metadata onto the grammar, and run matches via GrammarStoreImpl; gracefully fall back to static-grammar when the schema can't be discovered. - cache: export GrammarStoreImpl for the resolver. - typeagent-core: depend on @typeagent/action-schema + agent-cache. - studioRuntimeCore: surface the new 'schema-grammar' method. - replayViewModel: label/notes/fidelity for the schema-enriched method. Still no construction cache or wildcard-value validation (that's L2), so results remain indicative, not authoritative. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update the capability matrix + long-pole narrative to reflect what this branch ships: webviewKit + Impact Report webview (built/tested), and replay now at static-grammar + schema-enriched (L1) instead of an identity shell. Add a 'recently completed' entry for the replay resolvers, connection-aware UX, and the shared WS liveness heartbeat. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…_studio_part3 # Conflicts: # ts/pnpm-lock.yaml
There was a problem hiding this comment.
Pull request overview
This PR advances TypeAgent Studio’s “find a regression” workflow by replacing the replay placeholder with a deterministic grammar-based replay path in @typeagent/core, adding the Impact Report webview (with a reusable webviewKit), and extracting a shared WebSocket liveness heartbeat into @typeagent/websocket-utils for more reliable service connectivity.
Changes:
- Adds a static grammar replay resolver (with optional schema-enrichment) and wires Studio replay to use it when possible, including run-level “version build failed” surfacing.
- Ships the Impact Report webview with version A/B controls, durable panel state, and clearer method/fidelity context.
- Extracts and re-exports server/client heartbeat utilities and adopts the client heartbeat in the Studio service client.
Show a summary per file
| File | Description |
|---|---|
| ts/pnpm-lock.yaml | Updates lockfile for new workspace deps and Node typings resolution changes. |
| ts/packages/utils/webSocketUtils/src/heartbeat.ts | Adds shared server + client ping/pong heartbeat utilities. |
| ts/packages/utils/webSocketUtils/package.json | Exposes ./heartbeat entrypoint for the new utilities. |
| ts/packages/utils/webSocketChannelServer/src/heartbeat.ts | Re-exports heartbeat APIs from @typeagent/websocket-utils for back-compat. |
| ts/packages/utils/webSocketChannelServer/package.json | Adds dependency on @typeagent/websocket-utils. |
| ts/packages/typeagent-studio/src/webviewKit/replayViewModel.ts | Adds Impact Report view-model helpers (method notes, version parsing, header fields, error line). |
| ts/packages/typeagent-studio/src/webviewKit/protocol.ts | Extends webview protocol to include repo name and version A/B inputs for replay. |
| ts/packages/typeagent-studio/src/webviewKit/client/impactReport.ts | Implements Impact Report UI: header/banners, version controls, persistence, and restored rendering. |
| ts/packages/typeagent-studio/src/test/webviewProtocol.spec.ts | Updates protocol parsing tests for version A/B fields. |
| ts/packages/typeagent-studio/src/test/webviewBundle.spec.ts | Adds a packaging guard test ensuring builtInEntities.agr is copied into dist/. |
| ts/packages/typeagent-studio/src/test/studioServiceClient.spec.ts | Adds heartbeat behavior tests (healthy socket stays open; silent peer gets terminated). |
| ts/packages/typeagent-studio/src/test/replayViewModel.spec.ts | Adds tests for new replay view-model helpers and header field behavior. |
| ts/packages/typeagent-studio/src/studioServiceClient.ts | Attaches client heartbeat on initial connection; supports heartbeat period override for tests. |
| ts/packages/typeagent-studio/src/impactReportView.ts | Wires Impact Report host: repo name, version parsing, and “last result” recovery posting. |
| ts/packages/typeagent-studio/src/extension.ts | Adds corpus auto-refresh watcher and connection-aware view context wiring. |
| ts/packages/typeagent-studio/src/eventLogTreeProvider.ts | Adds connected state to drive welcome/empty rendering behavior. |
| ts/packages/typeagent-studio/src/corpusTreeProvider.ts | Adds connection-aware empty rendering; makes seed row clickable; updates icon. |
| ts/packages/typeagent-studio/src/corpusTreePresentation.ts | Updates seed empty-state copy to reflect click-to-create behavior. |
| ts/packages/typeagent-studio/src/collisionsTreeProvider.ts | Adds connection-aware empty rendering behavior. |
| ts/packages/typeagent-studio/package.json | Adds Impact Report icon + view welcome content; updates packaging script; adds websocket-utils dependency. |
| ts/packages/typeagent-studio/media/impactReport.css | Adds styling for context header, version fields, comparison line, and banners. |
| ts/packages/typeagent-studio/esbuild.mjs | Copies builtInEntities.agr into dist/ so bundled grammar compilation can resolve built-ins. |
| ts/packages/typeagent-studio/.vscodeignore | Ignores svc-out.txt in VSIX packaging. |
| ts/packages/typeagent-core/test/grammarReplayResolver.spec.ts | Adds unit tests covering grammar replay resolver behavior, git-vs-working-tree, and schema discovery/enrichment. |
| ts/packages/typeagent-core/src/runtime/studioRuntimeCore.ts | Wires replay to prefer grammar-based resolver (with enrichment) and returns method + run-level error when build fails. |
| ts/packages/typeagent-core/src/replay/grammarResolver.ts | Implements grammar replay resolver: git/working-tree materialization, grammar-store matching, optional schema enrichment. |
| ts/packages/typeagent-core/package.json | Adds ./replayResolver export; adds deps on action-schema and agent-cache. |
| ts/packages/cache/src/index.ts | Exports GrammarStoreImpl needed by the replay resolver. |
| ts/docs/plans/vscode-devx/USER-STORY.md | Updates the narrative to include agent-assisted tuning and Impact Report usage. |
| ts/docs/plans/vscode-devx/STATUS.md | Updates status matrix and long-pole section to reflect L1 replay + Impact Report completion. |
| ts/docs/plans/vscode-devx/DESIGN.md | Expands sandbox definition and introduces active-sandbox selector / overlay direction. |
| ts/docs/plans/vscode-devx/05-implementation-plan.md | Updates implementation plan with UX slices and adds P-7 sandbox isolation phase. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Files not reviewed (1)
- ts/pnpm-lock.yaml: Generated file
- Files reviewed: 31/32 changed files
- Comments generated: 3
… fixes Adds the live construction-cache consult (L2) on top of the schema-enriched grammar match: the working-tree side reproduces the dispatcher's schema-source hash and consults the real construction cache, falling back to grammar on any divergence. Reference sides never consult the cache. Impact Report now reports the replay method per side (A/B) with cache/grammar pills so a cache hit is distinguishable from a grammar match at a glance. Review fixes: - Re-attach the client heartbeat on reconnect (was only armed on first connect). - Guard an empty backoff schedule so reconnect can't tight-loop. - Gate built-schema-path hashing to .pas.json so .ts schemas hash from source. - Remove the always-on corpus FileSystemWatcher; UI create paths refresh the view and a manual refresh remains. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…es from the package The vsce private-key scanner was suppressed because two bundled sources carried PEM -----BEGIN/END PRIVATE KEY----- delimiters. Neither is real key material; both are now kept out of the package so the blanket suppression is unnecessary. - Alias mongodb to an inert stub in the studio esbuild config. The driver was pulled in transitively via @typeagent/telemetry's barrel re-export of createMongoDBLoggerSink, which Studio never calls; its client-side-encryption crypto callbacks embed PEM delimiters in the emitted bundle. Stubbing it drops the dead dependency and its PEM from extension.js / studio-service.js. - Ship without sourcemaps: @azure/msal-node's Configuration.ts has a PEM example in a source comment that only reached the package via the 40 MB .map files. .vscodeignore had *.map, which only matches root-level files, so dist/**/*.map was packaged anyway. Widen to **/*.map. vsce package now succeeds with only --allow-package-secrets npm (a bundled pm i --save-dev @types/... doc string, a genuine narrow false positive). The .vsix drops from ~45 MB to 2.5 MB. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n/doc refs from comments The grammar replay resolver computed the in-repo relative path from a non-canonicalized file path while git rev-parse --show-toplevel returns a canonical one. On macOS (/var -> /private/var) and Windows (8.3 short names like RUNNER~1 -> runneradmin) this produced a relative path that escaped the repo root, so git show <ref>:<path> failed with "is outside repository" and the grammarReplayResolver tests failed in CI (Ubuntu was unaffected). Resolve the file path with realpath() first so it shares the git toplevel namespace. Also clean up code comments across the studio packages: remove references to external design/plan documents and internal plan/feature codenames, describing current behavior instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
createWebSocketChannelServer hand-rolled an originAllowlist: string[] option plus an isOriginAllowed prefix matcher that no caller ever used (the only caller passes just a port) and that duplicated the Origin-gating logic the shared createAgentOriginAllowlist already provides. Replace the option with an isOriginAllowed predicate matching that helper's shape, so callers reuse the audited shared policy instead of a second implementation. No behavior change: no caller passed the removed option. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| @@ -0,0 +1,363 @@ | |||
| { | |||
There was a problem hiding this comment.
Don't we have a version of this type of file along side the construction code? If so can we reference that one instead of this one. I worry about drift...
| } | ||
| this.connected = connected; | ||
| this.refresh(); | ||
| } |
There was a problem hiding this comment.
these "treeProvider" classes have some shared code...could we condense and inherit from a generic treeProvider class?
| } = {}, | ||
| ) { | ||
| this.backoffMs = options.backoffMs ?? [2000, 4000, 8000, 15000]; | ||
| const DEFAULT_BACKOFF = [2000, 4000, 8000, 15000]; |
There was a problem hiding this comment.
why have fixed backoff options? Would it be better to have some sort of exponential with some sort of limit/restart option?
Replaces the "find a regression" flow's placeholder identity comparison with a
real replay path, ships the Impact Report webview, and hardens the service
connection with a shared liveness heartbeat.
Replay engine (
typeagent-core)grammar (the real deterministic, no-LLM path). Ships
builtInEntities.agrinthe bundle so it works when packaged.
(checked wildcards, params) onto the grammar and matches through the real
GrammarStore, falling back to plain grammar matching when the schema can't bediscovered.
agent's real per-session construction cache before the grammar, mirroring the
dispatcher's cache-first path. It is hash-gated exactly as the dispatcher gates
it, so a schema edit invalidates the cached entries (a cache hit is reported as
hit, a grammar fall-through asmiss). Consulted for the working tree only(caches are runtime artifacts, never committed or read at a git ref) and
degrades cleanly when the cache is missing or stale — which is itself the
regression signal. Discovered under the user data dir, overridable via
TYPEAGENT_STUDIO_CONSTRUCTION_CACHEfor determinism.Impact Report +
webviewKit(typeagent-studio)webviewKit: strict CSP/nonce HTML host, singleton-panelmanager, typed host↔webview protocol, browser-neutral view model.
the comparison, with a context header, A/B version controls, durable state
across navigate-away/reload, and run-error surfacing.
ref can never consult it. Each A/B field shows how that side actually resolved,
and cache-served hits are tagged (
B:hit·cachevs grammarB:miss·grammar),so the run-level label never implies the cache served a side that didn't read
it.
Connection-aware UX
content when the service is down (mirroring Sandboxes).
hand-edited corpus files are picked up by the manual refresh (no always-on file
watcher). Clearer seed-corpus affordance; Impact Report reachable from the
Corpora view title.
Shared WebSocket liveness heartbeat (
utils)one into
@typeagent/websocket-utils;websocket-channel-serverre-exportsboth for back-compat (other agents untouched).
half-open socket that never emits
close) and drive the existing reconnectpath — fixing a stale "connected" status. The watchdog is re-armed on every
reconnect, and the backoff guards against an empty interval list collapsing
into a tight loop.