refactor(runtime): runAgentic shot loop → canonical routerToolLoop (A1) by drewstone · Pull Request #299 · tangle-network/agent-runtime

drewstone · 2026-06-14T18:58:55Z

A1 of the converge. runShot hand-rolled the off-box chat→tool_calls→execute loop that routerToolLoop already is — the 3rd copy of that primitive (alongside benchSolveLeaf, the gym loop).

Generalized routerToolLoop additively so it can be the one loop:

initialMessages — seed with a carried conversation (depth continuation) instead of [system,user]
maxTokens — the worker completion cap (threaded through routerChatWithTools too)
returns the final messages (depth carry + analyst trajectory)

runShot is now a thin adapter (carried messages in, surface.call as execute, result → ShotOut).

Capability-preserving + non-breaking: new opts are optional, so external callers (appworld, humaneval-repair-gate) are untouched; depth is preserved — tests/loops/strategy-suite.test.ts 34/34 green; typecheck clean. One documented minor: malformed tool-call JSON now feeds an error back inside routerToolLoop (its existing behavior) instead of bumping toolErrors; surface.call errors still counted.

Keeps both loop substrates (your call) — they now share the one canonical off-box loop.

…routerToolLoop runShot hand-rolled the off-box chat→tool_calls→execute loop that routerToolLoop already is — a 3rd copy of the same primitive. Generalize routerToolLoop additively so it can BE the one loop: optional initialMessages (depth continuation — seed with the carried conversation instead of [system,user]), maxTokens (the worker completion cap), and it returns the final messages (for depth carry + the analyst trajectory). runShot becomes a thin adapter: carried messages in, surface.call as execute (ERROR/throw → counted toolError, fed back not thrown), result mapped to ShotOut. External callers (appworld, humaneval-repair-gate) unaffected — the new opts are optional. Depth preserved: tests/loops/strategy-suite.test.ts 34/34 green. Minor: malformed tool-call JSON now feeds an error back to the model inside routerToolLoop (its existing behavior) rather than incrementing toolErrors; surface.call errors still counted.

tangletools

✅ Auto-approved PR — `afad91fa`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-14T18:59:01Z}

tangletools

✅ Auto-approved PR — `53ffa9a7`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-14T19:00:54Z}

tangletools

🟡 Value Audit — sound-with-nits


Verdict	sound-with-nits
Concerns	4 (4 weak-concern)
Heuristic	0.0s
Duplication	0.0s
Interrogation	117.2s (2 bridge agents)
Total	117.2s

💰 Value — sound-with-nits

Replaces the hand-rolled shot loop in runAgentic with the canonical routerToolLoop, adding initialMessages/maxTokens/message carry so depth continuation works through one shared primitive — a coherent deduplication.

What it does: Generalizes src/runtime/router-client.ts:routerToolLoop with optional initialMessages (seed a carried conversation instead of [system,user]), maxTokens (threaded through routerChatWithTools), and a new returned messages array. Then refactors src/runtime/strategy.ts:runShot into a thin adapter that passes its carried messages into routerToolLoop and maps surface.call results back to ShotOut.
Goals it achieves: Eliminates a third copy of the off-box chat→tool_calls→execute loop, so runAgentic shares the same canonical loop already used by bench/appworld/humaneval-repair-gate. Preserves depth continuation, the worker completion cap, and existing external callers; reduces future maintenance and drift.
Assessment: Good change. It is additive/non-breaking, keeps the same semantics for surface.call errors and ERROR:-prefixed results, and follows the codebase's layering (router-client is the declared one router chat client). The test suite and typecheck claims are consistent with the diff.
Better / existing approach: none for the runShot refactor itself — routerToolLoop is the right primitive. However, there is still a fourth hand-rolled copy of the same loop in src/runtime/supervise/runtime.ts:248-368 (routerToolsInlineExecutor for the createExecutor({backend:'router-tools'}) substrate). It should eventually be folded into routerToolLoop too; until then the codebase has two off-box tool-loop implementations t

🎯 Usefulness — sound-with-nits

Refactors runAgentic's leaf onto the already-exported routerToolLoop substrate, removing a third copy of the off-box chat→tool→execute loop while preserving all existing callers and the depth-continuation messages contract.

Integration: The changed runShot is reached through shotExecutor (strategy.ts:345-423), which is spawned by every runAgentic path including the built-in depth/breadth drivers and custom strategies authored via defineStrategy (strategy.ts:728-847); runAgentic is exported from src/runtime/index.ts:251 and exercised in tests/loops/strategy-suite.test.ts. The enhanced routerToolLoop is already consumed by bench/sr
Fit with existing patterns: It follows the codebase's stated substrate layering: router-client.ts:5-6 documents routerToolLoop as the one off-box agentic loop, and docs/canonical-api.md:429-433 positions runAgentic as the canonical strategy driver. runShot was the remaining hand-rolled duplication; folding it into routerToolLoop is aligned rather than competing.
Real-world viability: The loop copies the carried conversation (router-client.ts:224-225), propagates maxTokens and abort signals, and feeds malformed model JSON back as a tool result instead of crashing (router-client.ts:263-273). Surface.call errors and throws are still counted as toolErrors (strategy.ts:157-166). The only behavioral delta is that malformed JSON no longer increments toolErrors; the PR documents this,

🎯 Usefulness Audit

🟡 routerToolLoop still requires dummy system/user when seeding with initialMessages [ergonomics] ``

runShot passes empty strings for system and user because routerToolLoop ignores them when initialMessages is set (router-client.ts:207-229; strategy.ts:168-182). The interface works, but a messages-first overload or optional system/user would be cleaner and reduce future caller confusion.

🟡 Malformed tool-call JSON no longer increments toolErrors [robustness] ``

The documented behavior change means toolErrors now only counts surface.call failures (ERROR: prefix or throws) and not model-emitted invalid JSON (strategy.ts:157-166 vs router-client.ts:263-273). Since strategy-author.ts:39 exposes toolErrors to strategy authors, dashboards or steering logic that treated it as a total model/tool failure count may now undercount. Acceptable if intentional, but worth a release-note callout rather than a silent semantic shift.

💰 Value Audit

🟡 routerToolsInlineExecutor still hand-rolls the same loop [duplication] ``

src/runtime/supervise/runtime.ts:248-368 implements chat→tool_calls→execute→fold results identically to routerToolLoop, but with no maxTokens, no initialMessages, fixed temperature 0.2, and no costUsd. The PR makes routerToolLoop the canonical off-box loop for runAgentic/bench; the Supervisor 'router-tools' executor should be refactored to call it next, otherwise duplication remains.

🟡 toolCalls count silently excludes malformed-JSON calls [proportion] ``

Old runShot incremented toolCalls for every emitted tool_call before JSON.parse (afad91f diff); routerToolLoop increments toolCalls only after successful parse (src/runtime/router-client.ts:261). The PR documents the toolErrors change but not this metric shift. If downstream comparisons rely on total emitted tool_calls, the numbers will differ.

What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass	What it asks
Heuristic	Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication	Do added function/class names already exist elsewhere in the repo?
Value Audit	What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit	Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

_{value-audit · 20260614T190433Z}

tangletools · 2026-06-14T19:08:13Z

✅ No Blockers — `53ffa9a7`

Readiness 60/100 · Confidence 65/100 · 3 findings (1 medium, 2 low)

	deepseek	glm	aggregate
Readiness	60	79	60
Confidence	65	65	65
Correctness	60	79	60
Security	60	79	60
Testing	60	79	60
Architecture	60	79	60

Full multi-shot audit completed 1/1 planned shots over 2 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 1/1 planned shots over 2 changed files. Global verifier still owns final merge decision.

🟠 MEDIUM Final assistant message dropped from carried conversation when model stops calling tools — src/runtime/router-client.ts

routerToolLoop returns at line 247 BEFORE pushing the assistant turn when toolCalls.length === 0: if (r.toolCalls.length === 0) return { ... messages }. The assistant message push at line 251-259 only runs when there ARE tool calls. The OLD runShot (strategy.ts pre-refactor) pushed the assistant message FIRST, then checked calls.length and broke — so the model's final text answer was always in the conversation. Impact: (1) depthDriver carries out.messages to the next shot — shot N+1 now starts without shot N's concluding as

🟡 LOW Token usage requires both prompt_tokens and completion_tokens to count either — src/runtime/router-client.ts

routerChatWithTools (line 160-163) gates usage on BOTH fields being numbers: u && typeof u.prompt_tokens === 'number' && typeof u.completion_tokens === 'number'. Old runShot counted them independently (two separate if-checks). If a provider returns only one of the two, the old code would sum that one; the new code sums neither. Edge case — providers typically return both or neither — but a subtle accounting regression for partial-usage providers.

🟡 LOW Malformed-JSON tool args no longer counted as toolErrors — src/runtime/strategy.ts

Old runShot caught JSON.parse failures and incremented toolErrors += 1, then still called the tool with empty args {}. routerToolLoop (router-client.ts:265-274) now pushes an error message and continues — skipping the execute callback entirely. Since runShot's toolErrors counter lives inside the execute closure (strategy.ts:158-167), malformed-JSON args are counted in routerToolLoop's toolCalls (line 261) but never in toolErrors. ShotOut.toolErrors undercounts. The new behavior (feed error back to model instead of calling tool with garbage args) is arguably better, but the metric is now inconsistent with its documented meaning.

_{tangletools · 2026-06-14T19:08:11Z · trace}

tangletools previously approved these changes Jun 14, 2026

View reviewed changes

style(runtime): biome format the A1 changes

53ffa9a

drewstone dismissed tangletools’s stale review via 53ffa9a June 14, 2026 19:00

tangletools approved these changes Jun 14, 2026

View reviewed changes

tangletools reviewed Jun 14, 2026

View reviewed changes

drewstone merged commit 8a3fdc6 into main Jun 14, 2026
1 check passed

drewstone deleted the refactor/runagentic-canonical-toolloop branch June 14, 2026 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(runtime): runAgentic shot loop → canonical routerToolLoop (A1)#299

refactor(runtime): runAgentic shot loop → canonical routerToolLoop (A1)#299
drewstone merged 2 commits into
mainfrom
refactor/runagentic-canonical-toolloop

drewstone commented Jun 14, 2026

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

tangletools commented Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drewstone commented Jun 14, 2026

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — afad91fa

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — 53ffa9a7

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

🟡 Value Audit — sound-with-nits

💰 Value — sound-with-nits

🎯 Usefulness — sound-with-nits

🎯 Usefulness Audit

💰 Value Audit

Uh oh!

tangletools commented Jun 14, 2026

✅ No Blockers — 53ffa9a7

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved PR — `afad91fa`

✅ Auto-approved PR — `53ffa9a7`

✅ No Blockers — `53ffa9a7`