refactor(bench): one resolveBenchClient(harness) — dedup the harness→client mapping#296
Conversation
…>client mapping rsi.ts and run.ts each hand-built the SandboxClient runLoop drives (rsi: full router/router-tools+search/sandbox branch off BACKEND; run: hardcoded new Sandbox). Extract the ONE mapping into resolve-client.ts: 'router' (+optional search) = off-box inline executor; anything else = in-box Sandbox. A new entrypoint gets the full off-box/in-box matrix for free and the mapping can't drift between callers. No behavior change (both callers preserve their prior client).
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — bb8b3c30
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-14T14:50:04Z
tangletools
left a comment
There was a problem hiding this comment.
🟢 Value Audit — sound
| Verdict | sound |
| Concerns | 0 (none) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 147.2s (2 bridge agents) |
| Total | 147.2s |
💰 Value — sound
Centralizes the duplicated bench harness→SandboxClient mapping from rsi.ts and run.ts into a single resolveBenchClient helper, preserving both callers' prior clients and making the off-box/in-box matrix reusable for future entrypoints.
- What it does: Extracts the duplicated conditional construction of
SandboxClient(router / router-tools+search / sandbox) frombench/src/rsi.tsandbench/src/run.tsinto a newbench/src/resolve-client.ts.resolveBenchClienttakes a backend selector plus router/sandbox credentials and returns the sameSandboxClienteach caller built before:backend === 'router'with an optionalsearchProviderbeco - Goals it achieves: 1) Eliminate duplicate harness→client branching so the mapping can't drift between
rsi.tsandrun.ts. 2) Make the full off-box/in-box matrix available to any new bench entrypoint via one helper. 3) Preserve existing behavior while moving toward the B8 "harness-drives-everything" direction. This is inferred from the code comments and the PR body, not from an external ticket. - Assessment: Good, coherent refactor. It removes real duplication (the same router/router-tools/sandbox ternary existed in
rsi.ts) and uses the existing runtime primitives (createExecutor,inlineSandboxClient,Sandbox) in the intended way. The change is behavior-preserving:rsi.tsstill respectsBACKENDandSEARCH;run.tsstill hardcodes the sandbox transport (its historical behavior). It align - Better / existing approach: none — this is the right approach. I searched for existing equivalents:
grep -R 'resolveBenchClient\|resolveClient\|resolveSandboxClient'found only the new file and its two callers;grep -R 'inlineSandboxClient\|createExecutor'inbench/srcshowed other call sites (router-executor.ts,search-bench/bridge.ts,generate-eval/run.mts) but each serves a different purpose (research-shot exe
🎯 Usefulness — sound
Dedupes the harness→client branch into one helper used by both live bench entrypoints; preserves behavior and fits the injected-SandboxClient pattern.
- Integration: The new helper is reachable today.
bench/src/rsi.ts:53callsresolveBenchClientwith theBACKENDenv selector, exercising both the off-boxrouter/router-toolspath (BACKEND=router, optionallySEARCH=you|exa) and the in-box Sandbox fallback.bench/src/run.ts:117calls it withbackend: 'sandbox'for the batch experiment presets. - Fit with existing patterns: It matches the bench layer's existing shape:
bench/src/experiment.ts:247injects aSandboxClientandsandboxAgentRunsets the in-box backend type separately on theAgentRunSpec. The off-box wrapping uses the runtime's owninlineSandboxClient(src/runtime/inline-sandbox-client.ts:44), which already exists precisely to present a non-box executor as aSandboxClient. - Real-world viability: It is behavior-preserving and uses the same primitives (
createExecutor,inlineSandboxClient,Sandbox) as the previous inline code, so concurrency, timeouts, and error paths are unchanged. Backend values other thanrouterfall through to the Sandbox path exactly like before; no new validation or failure mode is introduced.
No concerns — sound change, no better or existing approach found. ✅
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
✅ No Blockers —
|
| deepseek | glm | aggregate | |
|---|---|---|---|
| Readiness | 92 | 89 | 89 |
| Confidence | 65 | 65 | 65 |
| Correctness | 92 | 89 | 89 |
| Security | 92 | 89 | 89 |
| Testing | 92 | 89 | 89 |
| Architecture | 92 | 89 | 89 |
Full multi-shot audit completed 1/1 planned shots over 3 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 1/1 planned shots over 3 changed files. Global verifier still owns final merge decision.
🟡 LOW Unused destructured locals in sandbox return path — bench/src/resolve-client.ts
Line 37 destructures
routerBaseUrlandmodelfromoptsbut neither is consumed in the sandbox return branch (lines 53-57). Both ARE used in the router branch so no lint fires, but a maintainer reading the sandbox path sees them initialized and immediately discarded. Harmless — suggest moving the destructure inside theif (backend === 'router')block or extracting only what's needed:const { backend, routerKey, searchProvider } = opts.
🟡 LOW resolveBenchClient has no unit test despite being the single source of truth for client selection — bench/src/resolve-client.ts
The doc comment (lines 1-17) explicitly states this is 'The ONE harness → worker-client mapping' and 'the mapping can't drift between callers.' Given that it's now the centralized critical path for both rsi.ts and run.ts, a small unit test asserting the three branches (router+searchProvider → router-tools executor, router → plain executor, anything else → Sandbox instance) would lock the contract. Not blocking — the bench layer is integration-tested and the original inline code had no tests either — but worth noting for future safety.
🟡 LOW run.ts passes routerBaseUrl and model that are unused in the sandbox branch — bench/src/run.ts
Line 117: resolveBenchClient({ backend: 'sandbox', routerBaseUrl, routerKey, model, sandboxBaseUrl }). Since backend is hardcoded to 'sandbox', routerBaseUrl and model are destructured (resolve-client.ts:37) but never used in the else branch. This is cosmetically noisy but has zero runtime impact — the values are simply ignored. The tsconfig doesn't enable noUnusedLocals so tsc won't flag it. Consider omitting them or documenting that they're inert for the sandbox path.
tangletools · 2026-06-14T15:00:09Z · trace
rsi.ts and run.ts each hand-built the
SandboxClientrunLoopdrives — rsi had the full router / router-tools+search / sandbox branch offBACKEND; run hardcodednew Sandbox. Extracted the ONE mapping intoresolve-client.ts:router(+ optionalsearchProvider) → off-box inline executor (router / router-tools+web_search)sandbox/a BackendType) → in-boxSandboxA new entrypoint now gets the full off-box/in-box matrix for free, and the mapping can't drift between callers. No behavior change — both callers preserve their prior client. Capability-preserving step toward harness-drives-everything (B8).
Typecheck: the only error is a pre-existing
BackendTyperesolution artifact present on unmodified origin/main too (verified by control) — not introduced here.