Summary
When every model in the FallbackModel chain fails, the frontend shows only a generic Stream error: All models from FallbackModel failed (2 sub-exceptions). The backend logs contain the actionable per-model causes — these should be surfaced (safely) to the API/WebSocket client so issues are diagnosable from the UI.
Observed
Frontend (experiment agent, streaming chat):
Error: Stream error: All models from FallbackModel failed (2 sub-exceptions)
Backend agents.websocket_stream_error log held the real causes:
- Primary —
ModelHTTPError 404: invalid model name (models/google-gla:gemini-3-flash-preview is not found).
- Fallback —
ModelHTTPError 429: quota exhausted (RESOURCE_EXHAUSTED, free-tier generate_content_free_tier_requests limit 20/day for gemini-2.5-flash).
Expected
The WebSocket error event (and the equivalent REST chat error) should preserve a safe summary of each fallback sub-failure, e.g. per-model { model_name, status_code, reason }:
- 404 → "model not found / invalid model name"
- 429 → "quota/rate limit exhausted"
- 401/403 → "authentication/permission error"
So the UI can distinguish a model-name problem from a quota problem without a maintainer reading container logs.
Constraints
- Never expose secrets — no API keys, bearer tokens, Authorization headers, or
AIza… values in the surfaced summary or logs. Include only status code + provider message text (which is secret-free) or a mapped reason string.
- Keep the RFC 7807 /
ErrorEvent shape; classification should be additive.
Tests
- A
FallbackExceptionGroup with mixed sub-errors (404 + 429) produces an error payload listing each model's safe reason.
- Assert no secret-like material leaks into the surfaced error.
Context
Found while investigating the experiment-agent stream failure on 2026-06-01. Companion issue covers rejecting the doubled provider prefix that caused the 404 leg.
Summary
When every model in the
FallbackModelchain fails, the frontend shows only a genericStream error: All models from FallbackModel failed (2 sub-exceptions). The backend logs contain the actionable per-model causes — these should be surfaced (safely) to the API/WebSocket client so issues are diagnosable from the UI.Observed
Frontend (experiment agent, streaming chat):
Backend
agents.websocket_stream_errorlog held the real causes:ModelHTTPError 404: invalid model name (models/google-gla:gemini-3-flash-preview is not found).ModelHTTPError 429: quota exhausted (RESOURCE_EXHAUSTED, free-tiergenerate_content_free_tier_requestslimit 20/day forgemini-2.5-flash).Expected
The WebSocket
errorevent (and the equivalent REST chat error) should preserve a safe summary of each fallback sub-failure, e.g. per-model{ model_name, status_code, reason }:So the UI can distinguish a model-name problem from a quota problem without a maintainer reading container logs.
Constraints
AIza…values in the surfaced summary or logs. Include only status code + provider message text (which is secret-free) or a mapped reason string.ErrorEventshape; classification should be additive.Tests
FallbackExceptionGroupwith mixed sub-errors (404 + 429) produces an error payload listing each model's safe reason.Context
Found while investigating the experiment-agent stream failure on 2026-06-01. Companion issue covers rejecting the doubled provider prefix that caused the 404 leg.