Skip to content

refactor(bench): one resolveBenchClient(harness) — dedup the harness→client mapping#296

Merged
drewstone merged 1 commit into
mainfrom
cleanup/resolve-bench-client
Jun 14, 2026
Merged

refactor(bench): one resolveBenchClient(harness) — dedup the harness→client mapping#296
drewstone merged 1 commit into
mainfrom
cleanup/resolve-bench-client

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

rsi.ts and run.ts each hand-built the SandboxClient runLoop drives — rsi had the full router / router-tools+search / sandbox branch off BACKEND; run hardcoded new Sandbox. Extracted the ONE mapping into resolve-client.ts:

  • router (+ optional searchProvider) → off-box inline executor (router / router-tools+web_search)
  • anything else (sandbox/a BackendType) → in-box Sandbox

A new entrypoint now gets the full off-box/in-box matrix for free, and the mapping can't drift between callers. No behavior change — both callers preserve their prior client. Capability-preserving step toward harness-drives-everything (B8).

Typecheck: the only error is a pre-existing BackendType resolution artifact present on unmodified origin/main too (verified by control) — not introduced here.

…>client mapping

rsi.ts and run.ts each hand-built the SandboxClient runLoop drives (rsi: full
router/router-tools+search/sandbox branch off BACKEND; run: hardcoded new Sandbox).
Extract the ONE mapping into resolve-client.ts: 'router' (+optional search) = off-box
inline executor; anything else = in-box Sandbox. A new entrypoint gets the full
off-box/in-box matrix for free and the mapping can't drift between callers. No behavior
change (both callers preserve their prior client).

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — bb8b3c30

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-14T14:50:04Z

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Value Audit — sound

Verdict sound
Concerns 0 (none)
Heuristic 0.0s
Duplication 0.0s
Interrogation 147.2s (2 bridge agents)
Total 147.2s

💰 Value — sound

Centralizes the duplicated bench harness→SandboxClient mapping from rsi.ts and run.ts into a single resolveBenchClient helper, preserving both callers' prior clients and making the off-box/in-box matrix reusable for future entrypoints.

  • What it does: Extracts the duplicated conditional construction of SandboxClient (router / router-tools+search / sandbox) from bench/src/rsi.ts and bench/src/run.ts into a new bench/src/resolve-client.ts. resolveBenchClient takes a backend selector plus router/sandbox credentials and returns the same SandboxClient each caller built before: backend === 'router' with an optional searchProvider beco
  • Goals it achieves: 1) Eliminate duplicate harness→client branching so the mapping can't drift between rsi.ts and run.ts. 2) Make the full off-box/in-box matrix available to any new bench entrypoint via one helper. 3) Preserve existing behavior while moving toward the B8 "harness-drives-everything" direction. This is inferred from the code comments and the PR body, not from an external ticket.
  • Assessment: Good, coherent refactor. It removes real duplication (the same router/router-tools/sandbox ternary existed in rsi.ts) and uses the existing runtime primitives (createExecutor, inlineSandboxClient, Sandbox) in the intended way. The change is behavior-preserving: rsi.ts still respects BACKEND and SEARCH; run.ts still hardcodes the sandbox transport (its historical behavior). It align
  • Better / existing approach: none — this is the right approach. I searched for existing equivalents: grep -R 'resolveBenchClient\|resolveClient\|resolveSandboxClient' found only the new file and its two callers; grep -R 'inlineSandboxClient\|createExecutor' in bench/src showed other call sites (router-executor.ts, search-bench/bridge.ts, generate-eval/run.mts) but each serves a different purpose (research-shot exe

🎯 Usefulness — sound

Dedupes the harness→client branch into one helper used by both live bench entrypoints; preserves behavior and fits the injected-SandboxClient pattern.

  • Integration: The new helper is reachable today. bench/src/rsi.ts:53 calls resolveBenchClient with the BACKEND env selector, exercising both the off-box router/router-tools path (BACKEND=router, optionally SEARCH=you|exa) and the in-box Sandbox fallback. bench/src/run.ts:117 calls it with backend: 'sandbox' for the batch experiment presets.
  • Fit with existing patterns: It matches the bench layer's existing shape: bench/src/experiment.ts:247 injects a SandboxClient and sandboxAgentRun sets the in-box backend type separately on the AgentRunSpec. The off-box wrapping uses the runtime's own inlineSandboxClient (src/runtime/inline-sandbox-client.ts:44), which already exists precisely to present a non-box executor as a SandboxClient.
  • Real-world viability: It is behavior-preserving and uses the same primitives (createExecutor, inlineSandboxClient, Sandbox) as the previous inline code, so concurrency, timeouts, and error paths are unchanged. Backend values other than router fall through to the Sandbox path exactly like before; no new validation or failure mode is introduced.

No concerns — sound change, no better or existing approach found. ✅


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260614T145428Z

@tangletools

Copy link
Copy Markdown
Contributor

✅ No Blockers — bb8b3c30

Readiness 89/100 · Confidence 65/100 · 3 findings (3 low)

deepseek glm aggregate
Readiness 92 89 89
Confidence 65 65 65
Correctness 92 89 89
Security 92 89 89
Testing 92 89 89
Architecture 92 89 89

Full multi-shot audit completed 1/1 planned shots over 3 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 1/1 planned shots over 3 changed files. Global verifier still owns final merge decision.

🟡 LOW Unused destructured locals in sandbox return path — bench/src/resolve-client.ts

Line 37 destructures routerBaseUrl and model from opts but neither is consumed in the sandbox return branch (lines 53-57). Both ARE used in the router branch so no lint fires, but a maintainer reading the sandbox path sees them initialized and immediately discarded. Harmless — suggest moving the destructure inside the if (backend === 'router') block or extracting only what's needed: const { backend, routerKey, searchProvider } = opts.

🟡 LOW resolveBenchClient has no unit test despite being the single source of truth for client selection — bench/src/resolve-client.ts

The doc comment (lines 1-17) explicitly states this is 'The ONE harness → worker-client mapping' and 'the mapping can't drift between callers.' Given that it's now the centralized critical path for both rsi.ts and run.ts, a small unit test asserting the three branches (router+searchProvider → router-tools executor, router → plain executor, anything else → Sandbox instance) would lock the contract. Not blocking — the bench layer is integration-tested and the original inline code had no tests either — but worth noting for future safety.

🟡 LOW run.ts passes routerBaseUrl and model that are unused in the sandbox branch — bench/src/run.ts

Line 117: resolveBenchClient({ backend: 'sandbox', routerBaseUrl, routerKey, model, sandboxBaseUrl }). Since backend is hardcoded to 'sandbox', routerBaseUrl and model are destructured (resolve-client.ts:37) but never used in the else branch. This is cosmetically noisy but has zero runtime impact — the values are simply ignored. The tsconfig doesn't enable noUnusedLocals so tsc won't flag it. Consider omitting them or documenting that they're inert for the sandbox path.


tangletools · 2026-06-14T15:00:09Z · trace

@drewstone drewstone merged commit 8c04525 into main Jun 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants