Skip to content

TypeAgent Studio — real replay path + Impact Report#2509

Open
TalZaccai wants to merge 19 commits into
mainfrom
dev/talzacc/typeagent_studio_part3
Open

TypeAgent Studio — real replay path + Impact Report#2509
TalZaccai wants to merge 19 commits into
mainfrom
dev/talzacc/typeagent_studio_part3

Conversation

@TalZaccai

@TalZaccai TalZaccai commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Replaces the "find a regression" flow's placeholder identity comparison with a
real replay path, ships the Impact Report webview, and hardens the service
connection with a shared liveness heartbeat.

Replay engine (typeagent-core)

  • Grammar replay — utterances are matched against the agent's compiled
    grammar (the real deterministic, no-LLM path). Ships builtInEntities.agr in
    the bundle so it works when packaged.
  • Schema-enriched matching — projects the agent's action-schema metadata
    (checked wildcards, params) onto the grammar and matches through the real
    GrammarStore, falling back to plain grammar matching when the schema can't be
    discovered.
  • Construction-cache consult — the live working-tree side consults the
    agent's real per-session construction cache before the grammar, mirroring the
    dispatcher's cache-first path. It is hash-gated exactly as the dispatcher gates
    it, so a schema edit invalidates the cached entries (a cache hit is reported as
    hit, a grammar fall-through as miss). Consulted for the working tree only
    (caches are runtime artifacts, never committed or read at a git ref) and
    degrades cleanly when the cache is missing or stale — which is itself the
    regression signal. Discovered under the user data dir, overridable via
    TYPEAGENT_STUDIO_CONSTRUCTION_CACHE for determinism.

Impact Report + webviewKit (typeagent-studio)

  • Minimal reusable webviewKit: strict CSP/nonce HTML host, singleton-panel
    manager, typed host↔webview protocol, browser-neutral view model.
  • Impact Report webview drives the replay over the service channel and renders
    the comparison, with a context header, A/B version controls, durable state
    across navigate-away/reload, and run-error surfacing.
  • Per-side method labels — since the construction cache is live-only, a git
    ref can never consult it. Each A/B field shows how that side actually resolved,
    and cache-served hits are tagged (B:hit·cache vs grammar B:miss·grammar),
    so the run-level label never implies the cache served a side that didn't read
    it.

Connection-aware UX

  • Corpora / Event Log / Collisions views show "Connect to Studio service" welcome
    content when the service is down (mirroring Sandboxes).
  • Corpora view refreshes after any corpus change made through the UI; externally
    hand-edited corpus files are picked up by the manual refresh (no always-on file
    watcher). Clearer seed-corpus affordance; Impact Report reachable from the
    Corpora view title.
  • Fixes the Impact Report double-render on navigate-away/back.

Shared WebSocket liveness heartbeat (utils)

  • Extracts a client-side ping/pong watchdog alongside the existing server-side
    one into @typeagent/websocket-utils; websocket-channel-server re-exports
    both for back-compat (other agents untouched).
  • The Studio service client uses it to detect a silently-dropped service (a
    half-open socket that never emits close) and drive the existing reconnect
    path — fixing a stale "connected" status. The watchdog is re-armed on every
    reconnect, and the backoff guards against an empty interval list collapsing
    into a tight loop.

TalZaccai and others added 12 commits June 16, 2026 13:43
Merge the human-driven and AI-driven walkthroughs into one narrative where
Aida delegates the tune -> find-regressions -> validate loop to an AI agent,
then finishes the utterances it left uncovered herself. Soften API-level
jargon to describe the logic, make the authoring prompt language-neutral, and
note that the collision scan catches cross-agent overlaps, not just intra-agent.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…te overlay

Add DESIGN.md 3.6 capturing what a sandbox is and where it's going:
- composition (and therefore collisions) is intrinsically per-sandbox;
- source is today a filtered view over shared repo working-tree, evolving to a
  per-sandbox copy-on-write overlay (a full debugging experience) via the
  loader's ordered agent-roots seam;
- UI uses a single active-sandbox selector that scopes Corpora/Collisions/Event
  Log (visibility now, mutation once overlays land).

Cross-reference the Sandbox primitive (3.2) and tree-views note (3.4), and add
two STATUS.md next slices: the active-sandbox selector and the isolated overlay.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add P-7 to the implementation plan, deliberately after the single-sandbox
end-to-end headline (P-3/J4, Gates A-E) closes:
- P-7a: active-sandbox selector + per-sandbox scoping of Corpora/Collisions/
  Event Log (visibility/analysis only);
- P-7b: per-sandbox copy-on-write overlay via the loader's ordered agentRoots
  seam (mutation-local; overlay-vs-base replay; create-from-base -> tune ->
  promote/discard lifecycle).

Add open decisions D11 (overlay substrate) and D12 (promote semantics), a
source-isolation note under the sandbox open decisions (3.4), and tag the
matching STATUS next-slices with their P-7 phases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…anation path)

Replace the identity replay stub with a resolver that evaluates each corpus
utterance against two grammar versions via the action-grammar NFA engine, so a
genuine grammar edit produces a genuine actionA != actionB row in the Impact
Report.

This is the first slice of the F4.1 'real replay path' (the P-3 long pole),
implementing the deterministic needs-explanation policy:

- createGrammarReplayResolver: per-run resolver that compiles each version's
  .agr to an NFA (memoized per side) and matches utterances; normalizes matches
  to {schemaName, actionName, parameters?} and canonicalizes empty parameters so
  actionsEqual is correct.
- Version materialization via 'git show <ref>:<path>' (working-tree read for
  workingTree) - stateless, no worktree/checkout to clean up.
- prepare() builds both versions up front so a build failure aborts the run
  cleanly (ReplayVersionBuildError -> run-level error) instead of throwing
  mid-stream and hanging the engine's row channel.
- resolveGrammarReplayTarget: single-.agr agents only (safe first-slice scope).
- studioRuntimeCore auto-engages the grammar resolver when no resolver is
  injected + needs-explanation + repoRoot + a single-grammar agent; otherwise
  falls back to identity. Build failures return a new optional StudioReplayResult.error.

Scope is deliberately 'static grammar replay', not full dispatcher fidelity:
it does not consult the construction cache or run validation. Full-dispatch
resolution is a later slice behind the same ReplayActionResolver interface.

Tests: unit (normalizeGrammarAction, working-tree match/miss, build-failure) +
git integration (working-tree vs HEAD divergence -> lost-match row through
replayCorpus). 146/146 typeagent-core tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eport

The Impact Report now labels static-grammar replays as indicative (not
authoritative full-dispatch) and renders a run-level version-build error
instead of a misleading zero-row success.

- studioRuntimeCore: add StudioReplayMethod and a required method field on
  StudioReplayResult; set it to static-grammar when the grammar resolver
  engages (and on aborted results), else identity.
- replayViewModel: toImpactMethodNote (caveat banner) + toImpactErrorLine.
- Impact Report webview: render a note/error banner; CSS for both.
- Tests for the two new view-model helpers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Impact Report now drives a real two-version compare. Base (A) and
Compare (B) text fields accept a git ref or the working-tree keyword and
default to HEAD -> working tree (the find-a-regression journey). The host
maps them to VersionSpecs and passes them to replayCorpus instead of the
hard-coded working-tree self-compare, so an uncommitted grammar edit now
produces a genuine action delta.

- protocol: run message carries versionA/versionB strings (validated).
- replayViewModel: parseVersionInput, describeVersion, toImpactComparisonLine.
- webview: labelled version inputs (persisted in panel state, Enter to run)
  + a Comparing A -> B line; CSS for the fields and line.
- impactReportView host: parseVersionInput -> replayCorpus versions.
- Tests for the new helpers and the extended run-message parsing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ks packaged

Compiling a grammar that uses built-in entities (Ordinal/Cardinal — the
player grammar does) reads builtInEntities.agr from disk next to the bundle.
The packaged extension didn't ship it, so the Impact Report's static-grammar
replay threw ENOENT (version A/B failed to build). esbuild now copies the
asset into dist/ next to both the extension and service bundles, where
action-grammar's loader looks first. .vscodeignore keeps dist/, so it ships.

Adds a build-guard test that the asset lands in dist/.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add the first Impact Report UX-overhaul slice (F4.3 pane 1 + the state-loss fix):

- Context header band (repo, agent, method, fidelity, sandbox, policy) via a new
  browser-neutral toImpactHeaderFields helper. Honestly labels the static-grammar
  path as fidelity: indicative and sandbox: not used (it reads grammar from git /
  the working tree and is not sandbox-bound), so results aren't read as full
  dispatch.
- Hybrid durable state so the report survives navigate-away/reload: the webview
  persists the row-capped last result and re-renders it on load, and the host
  re-pushes the full result on 'ready' to recover a run that finished while the
  iframe was torn down (panel is retainContextWhenHidden: false). Request-id
  dedupe accepts both the matching run and a recovery re-push.
- Tooltips on all controls; init now carries the repo name.

Folds the UX overhaul (U1-U4) into the implementation plan under P-3 J4
F4.2/F4.3. studio tests 143 pass; build + prettier clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
UI/UX fixes:
- Impact Report: reset output regions on render so navigate-away/back no longer double-renders the result.
- Corpora: FileSystemWatcher on in-repo *.utterances.jsonl auto-refreshes the tree; clearer clickable seed-corpus affordance (new-file icon + 'click to create' text).
- Discoverability: Open Impact Report button on the Corpora view title.
- Connection-aware empty states: Corpora/EventLog/Collisions providers return empty when disconnected (mirroring Sandboxes); a typeagentStudio.serviceConnected context key drives viewsWelcome 'Connect to Studio service' content for all four views.

Packaging:
- Allow the vsce 'privatekey' secret false-positive (bundled crypto dep template literal); ignore stray svc-out.txt.

Shared WS liveness heartbeat:
- Extract attachClientHeartbeat (client-direction ping/pong watchdog) alongside attachHeartbeat into @typeagent/websocket-utils/heartbeat, the cross-cutting package shared across agents. websocket-channel-server re-exports both for back-compat (code agent / agentServer untouched).
- Studio's service client now uses attachClientHeartbeat to detect a silently-dropped service (half-open socket that never emits close) and synthesize the close that drives the existing disconnect+backoff reconnect path, instead of a private copy.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… L1)

Enrich an agent's compiled grammar with checked-variable metadata derived
from its action schema and match utterances through the real GrammarStore,
giving a higher-fidelity replay than static-grammar matching.

- grammarResolver: discover the agent's schema (originalSchemaFile +
  schemaType.action), project checked_wildcard/param metadata onto the
  grammar, and run matches via GrammarStoreImpl; gracefully fall back to
  static-grammar when the schema can't be discovered.
- cache: export GrammarStoreImpl for the resolver.
- typeagent-core: depend on @typeagent/action-schema + agent-cache.
- studioRuntimeCore: surface the new 'schema-grammar' method.
- replayViewModel: label/notes/fidelity for the schema-enriched method.

Still no construction cache or wildcard-value validation (that's L2), so
results remain indicative, not authoritative.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update the capability matrix + long-pole narrative to reflect what this
branch ships: webviewKit + Impact Report webview (built/tested), and replay
now at static-grammar + schema-enriched (L1) instead of an identity shell.
Add a 'recently completed' entry for the replay resolvers, connection-aware
UX, and the shared WS liveness heartbeat.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…_studio_part3

# Conflicts:
#	ts/pnpm-lock.yaml
@TalZaccai TalZaccai changed the title Dev/talzacc/typeagent studio part3 TypeAgent Studio — real replay path + Impact Report Jun 18, 2026
@TalZaccai TalZaccai requested a review from Copilot June 18, 2026 02:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR advances TypeAgent Studio’s “find a regression” workflow by replacing the replay placeholder with a deterministic grammar-based replay path in @typeagent/core, adding the Impact Report webview (with a reusable webviewKit), and extracting a shared WebSocket liveness heartbeat into @typeagent/websocket-utils for more reliable service connectivity.

Changes:

  • Adds a static grammar replay resolver (with optional schema-enrichment) and wires Studio replay to use it when possible, including run-level “version build failed” surfacing.
  • Ships the Impact Report webview with version A/B controls, durable panel state, and clearer method/fidelity context.
  • Extracts and re-exports server/client heartbeat utilities and adopts the client heartbeat in the Studio service client.
Show a summary per file
File Description
ts/pnpm-lock.yaml Updates lockfile for new workspace deps and Node typings resolution changes.
ts/packages/utils/webSocketUtils/src/heartbeat.ts Adds shared server + client ping/pong heartbeat utilities.
ts/packages/utils/webSocketUtils/package.json Exposes ./heartbeat entrypoint for the new utilities.
ts/packages/utils/webSocketChannelServer/src/heartbeat.ts Re-exports heartbeat APIs from @typeagent/websocket-utils for back-compat.
ts/packages/utils/webSocketChannelServer/package.json Adds dependency on @typeagent/websocket-utils.
ts/packages/typeagent-studio/src/webviewKit/replayViewModel.ts Adds Impact Report view-model helpers (method notes, version parsing, header fields, error line).
ts/packages/typeagent-studio/src/webviewKit/protocol.ts Extends webview protocol to include repo name and version A/B inputs for replay.
ts/packages/typeagent-studio/src/webviewKit/client/impactReport.ts Implements Impact Report UI: header/banners, version controls, persistence, and restored rendering.
ts/packages/typeagent-studio/src/test/webviewProtocol.spec.ts Updates protocol parsing tests for version A/B fields.
ts/packages/typeagent-studio/src/test/webviewBundle.spec.ts Adds a packaging guard test ensuring builtInEntities.agr is copied into dist/.
ts/packages/typeagent-studio/src/test/studioServiceClient.spec.ts Adds heartbeat behavior tests (healthy socket stays open; silent peer gets terminated).
ts/packages/typeagent-studio/src/test/replayViewModel.spec.ts Adds tests for new replay view-model helpers and header field behavior.
ts/packages/typeagent-studio/src/studioServiceClient.ts Attaches client heartbeat on initial connection; supports heartbeat period override for tests.
ts/packages/typeagent-studio/src/impactReportView.ts Wires Impact Report host: repo name, version parsing, and “last result” recovery posting.
ts/packages/typeagent-studio/src/extension.ts Adds corpus auto-refresh watcher and connection-aware view context wiring.
ts/packages/typeagent-studio/src/eventLogTreeProvider.ts Adds connected state to drive welcome/empty rendering behavior.
ts/packages/typeagent-studio/src/corpusTreeProvider.ts Adds connection-aware empty rendering; makes seed row clickable; updates icon.
ts/packages/typeagent-studio/src/corpusTreePresentation.ts Updates seed empty-state copy to reflect click-to-create behavior.
ts/packages/typeagent-studio/src/collisionsTreeProvider.ts Adds connection-aware empty rendering behavior.
ts/packages/typeagent-studio/package.json Adds Impact Report icon + view welcome content; updates packaging script; adds websocket-utils dependency.
ts/packages/typeagent-studio/media/impactReport.css Adds styling for context header, version fields, comparison line, and banners.
ts/packages/typeagent-studio/esbuild.mjs Copies builtInEntities.agr into dist/ so bundled grammar compilation can resolve built-ins.
ts/packages/typeagent-studio/.vscodeignore Ignores svc-out.txt in VSIX packaging.
ts/packages/typeagent-core/test/grammarReplayResolver.spec.ts Adds unit tests covering grammar replay resolver behavior, git-vs-working-tree, and schema discovery/enrichment.
ts/packages/typeagent-core/src/runtime/studioRuntimeCore.ts Wires replay to prefer grammar-based resolver (with enrichment) and returns method + run-level error when build fails.
ts/packages/typeagent-core/src/replay/grammarResolver.ts Implements grammar replay resolver: git/working-tree materialization, grammar-store matching, optional schema enrichment.
ts/packages/typeagent-core/package.json Adds ./replayResolver export; adds deps on action-schema and agent-cache.
ts/packages/cache/src/index.ts Exports GrammarStoreImpl needed by the replay resolver.
ts/docs/plans/vscode-devx/USER-STORY.md Updates the narrative to include agent-assisted tuning and Impact Report usage.
ts/docs/plans/vscode-devx/STATUS.md Updates status matrix and long-pole section to reflect L1 replay + Impact Report completion.
ts/docs/plans/vscode-devx/DESIGN.md Expands sandbox definition and introduces active-sandbox selector / overlay direction.
ts/docs/plans/vscode-devx/05-implementation-plan.md Updates implementation plan with UX slices and adds P-7 sandbox isolation phase.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files not reviewed (1)
  • ts/pnpm-lock.yaml: Generated file
  • Files reviewed: 31/32 changed files
  • Comments generated: 3

Comment thread ts/packages/typeagent-studio/src/studioServiceClient.ts Outdated
Comment thread ts/packages/typeagent-studio/package.json Outdated
Comment thread ts/packages/typeagent-studio/src/extension.ts Outdated
… fixes

Adds the live construction-cache consult (L2) on top of the schema-enriched
grammar match: the working-tree side reproduces the dispatcher's schema-source
hash and consults the real construction cache, falling back to grammar on any
divergence. Reference sides never consult the cache.

Impact Report now reports the replay method per side (A/B) with cache/grammar
pills so a cache hit is distinguishable from a grammar match at a glance.

Review fixes:
- Re-attach the client heartbeat on reconnect (was only armed on first connect).
- Guard an empty backoff schedule so reconnect can't tight-loop.
- Gate built-schema-path hashing to .pas.json so .ts schemas hash from source.
- Remove the always-on corpus FileSystemWatcher; UI create paths refresh the
  view and a manual refresh remains.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
typeagent-bot Bot and others added 2 commits June 18, 2026 02:57
…es from the package

The vsce private-key scanner was suppressed because two bundled sources carried
PEM -----BEGIN/END PRIVATE KEY----- delimiters. Neither is real key material;
both are now kept out of the package so the blanket suppression is unnecessary.

- Alias mongodb to an inert stub in the studio esbuild config. The driver was
  pulled in transitively via @typeagent/telemetry's barrel re-export of
  createMongoDBLoggerSink, which Studio never calls; its client-side-encryption
  crypto callbacks embed PEM delimiters in the emitted bundle. Stubbing it drops
  the dead dependency and its PEM from extension.js / studio-service.js.
- Ship without sourcemaps: @azure/msal-node's Configuration.ts has a PEM example
  in a source comment that only reached the package via the 40 MB .map files.
  .vscodeignore had *.map, which only matches root-level files, so dist/**/*.map
  was packaged anyway. Widen to **/*.map.

vsce package now succeeds with only --allow-package-secrets npm (a bundled

pm i --save-dev @types/... doc string, a genuine narrow false positive). The
.vsix drops from ~45 MB to 2.5 MB.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n/doc refs from comments

The grammar replay resolver computed the in-repo relative path from a
non-canonicalized file path while git rev-parse --show-toplevel returns a
canonical one. On macOS (/var -> /private/var) and Windows (8.3 short names
like RUNNER~1 -> runneradmin) this produced a relative path that escaped the
repo root, so git show <ref>:<path> failed with "is outside repository" and
the grammarReplayResolver tests failed in CI (Ubuntu was unaffected). Resolve
the file path with realpath() first so it shares the git toplevel namespace.

Also clean up code comments across the studio packages: remove references to
external design/plan documents and internal plan/feature codenames, describing
current behavior instead.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
createWebSocketChannelServer hand-rolled an originAllowlist: string[]
option plus an isOriginAllowed prefix matcher that no caller ever used
(the only caller passes just a port) and that duplicated the Origin-gating
logic the shared createAgentOriginAllowlist already provides. Replace the
option with an isOriginAllowed predicate matching that helper's shape, so
callers reuse the audited shared policy instead of a second implementation.
No behavior change: no caller passed the removed option.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@TalZaccai TalZaccai marked this pull request as ready for review June 18, 2026 18:45
@TalZaccai TalZaccai temporarily deployed to development-fork June 18, 2026 19:57 — with GitHub Actions Inactive
@TalZaccai TalZaccai requested a review from robgruen June 18, 2026 20:06
@@ -0,0 +1,363 @@
{

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have a version of this type of file along side the construction code? If so can we reference that one instead of this one. I worry about drift...

}
this.connected = connected;
this.refresh();
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these "treeProvider" classes have some shared code...could we condense and inherit from a generic treeProvider class?

} = {},
) {
this.backoffMs = options.backoffMs ?? [2000, 4000, 8000, 15000];
const DEFAULT_BACKOFF = [2000, 4000, 8000, 15000];

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why have fixed backoff options? Would it be better to have some sort of exponential with some sort of limit/restart option?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants