feat(telegram): unique photo filenames + caption-aware auto-vision#23
Conversation
Two fixes for the Telegram photo flow:
1) Filename collision ("image already processed"). DownloadPhoto/DownloadVoice
named files photo_<fileID[:16]>.<ext>, but Telegram file_ids share a long
constant prefix (e.g. "AgACAgIAAxkBAAI…") — the distinguishing bytes come
*after* char 16. Truncating kept only the shared prefix, so every photo
mapped to the same filename and overwrote the last one. Now we hash the full
file_id (SHA-256, first 16 hex chars) for a genuinely unique suffix. Adds a
prefix-collision regression test.
2) Caption-aware vision. Photos can carry a caption (the user's request), which
was silently dropped, and the agent had to discover/call vision itself. Now:
- Message gains a Caption field; OnPhotoMessage receives it.
- New vision.auto_describe config (default true, mirrors auto_transcribe).
- On a photo, the bot runs the vision model FIRST (focused by the caption if
present) to extract a description, then injects "[description] + caption"
to the agent so it answers the request. Falls back to the path-based
message when auto-describe is off or vision fails.
Docker configs ship vision.auto_describe=true. Docs (CHEATSHEET, TELEGRAM)
updated. All packages build, vet clean, tests pass under -race.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
odek | 714bbc1 | Commit Preview URL Branch Preview URL |
Jun 07 2026, 03:14 PM |
…e funcs vprotocol auto-repair (§6.2 property tests). The photo-handler message composition lived inline in an untested closure in package main, leaving the new branching logic (caption present/absent, vision success/fallback) unexercised — the binding weakness in the verification η. Extract three pure functions — photoVisionPrompt, photoVisionMessage, photoFallbackMessage — and cover them with unit tests, including a regression that the <untrusted_content> wrapping is preserved verbatim when the description is injected into the agent (axis 2.8). No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
vprotocol v5.2.7 — Verification CertificatePR: #23 Pre-scan (§0)Deterministic scan of the diff for injection markers / verdict tokens / new exec sinks: clean. The one new untrusted→LLM path (image description → agent message) is delimited: the vision tool wraps the description in nonce'd Nine Axes
η Derivation (re-derived post-repair)
η_raw = 0.671 · ρ = 0.24 (family +0.10, version +0.05, spec_independence +0.05, AST ~0.02, shared-mutants ~0.02) Verdict:
|
Summary
Two fixes for the Telegram photo flow, reported from live use.
1. Filename collision — "image already processed"
DownloadPhoto/DownloadVoicenamed filesphoto_<fileID[:16]>.<ext>. Telegramfile_ids share a long constant prefix (e.g.AgACAgIAAxkBAAI…) that encodes file-type/datacenter/version — the bytes that actually distinguish one file from another come after char 16. Truncating kept only the shared prefix, so every photo mapped to the same filename and overwrote the previous one, making the bot treat each new image as already-seen.Fix: new
fileIDSuffix()hashes the fullfile_id(SHA-256, first 16 hex chars) for a genuinely unique suffix. Applied to both photo and voice downloads.2. Caption-aware auto-vision
A photo can carry a caption (the user's actual request), which was silently dropped — and the agent had to discover/call
visionitself.Fix:
Messagegains aCaptionfield;OnPhotoMessagenow receives it.vision.auto_describeconfig (default true, mirrorstranscription.auto_transcribe).[description] + captionto the agent so it answers the request using the description.The extracted description stays wrapped in
<untrusted_content>boundaries (image text is untrusted input); the caption is the user's own trusted request.Behavior
Config
Docker configs (
config.restricted.json,config.godmode.json) shipvision.auto_describe: true. Note: likeauto_transcribe, the default-true only applies when thevisionsection is entirely absent, so a present section must set the flag explicitly.Tests
TestDownloadPhoto_PrefixCollisionAvoided— regression: two IDs sharing a prefix produce different filenames.TestDownloadVoice_HashedFileIDSuffix/TestDownloadPhoto_HashedFileIDSuffix— hashed suffix, raw prefix absent.TestHandleUpdate_PhotoMessage— asserts caption threading.TestResolveVision_Defaults/TestResolveVision_AutoDescribePreserved—auto_describedefault + explicit values.All packages build,
go vetclean, tests pass under-race.Docs
docs/CHEATSHEET.mdanddocs/TELEGRAM.mdupdated (auto-describe flow, new filename scheme, updated handler signature).🤖 Generated with Claude Code