feat(vision): add vision tool using MiniCPM-V 4.6 (1.3B) via llama-mtmd-cli by jkyberneees · Pull Request #22 · BackendStack21/odek

jkyberneees · 2026-06-07T13:56:05Z

Summary

Adds a vision built-in tool that analyses images and videos locally using MiniCPM-V 4.6 (1.3B multimodal model) via llama-mtmd-cli — no cloud API, no external service
Docker image gains a new minicpm multi-stage build that downloads a pre-built llama-mtmd-cli binary (llama.cpp b9549, amd64/arm64) and fetches the model GGUF (Q4_K_M, 529 MB) + vision projector (mmproj, 1.1 GB) from HuggingFace at build time
New VisionConfig added to the config system; all existing builtinTools() call sites updated; vision registered as a built-in tool alongside transcribe

What the tool does

Input	Behaviour
Image (JPEG, PNG, GIF, WebP, BMP)	Base64-encodes and sends to `llama-mtmd-cli` in single-turn mode
Video (MP4, MOV, AVI, MKV, WebM)	`ffprobe` reads duration → `ffmpeg` extracts N evenly-spaced frames → multi-image call with all frames

Output is wrapped in wrapUntrusted() (same as transcribe) — classified as always-untrusted in the security model.

Config

{
  "vision": {
    "models_dir": "~/.odek/minicpm-v/models",
    "binary_path": "/usr/local/bin/llama-mtmd-cli",
    "video_frames": 8
  }
}

All fields are optional — the Docker image path /usr/local/share/minicpm-v/models/ is auto-detected.

Docker build args

# Use a higher-quality quantization (default: Q4_K_M)
docker compose --profile restricted up --build \
  --build-arg MINICPM_QUANT=Q8_0

# Pin a different llama.cpp release (default: b9549)
--build-arg LLAMA_VERSION=b9600

Test plan

All 21 packages pass: go test ./... -count=1
TestVision_* (13 tests) — unit + mock-binary happy paths for images and video
TestResolveVision_* (3 tests) — config resolver defaults and custom values
Docker image builds: docker compose --profile restricted up --build (requires internet access to HuggingFace during build)
Smoke test: pass a local JPEG to the agent and verify vision returns a description
Smoke test video: pass a short MP4 and verify frames > 0 in the result

🤖 Generated with Claude Code

…md-cli Adds a `vision` built-in tool that analyses images and videos locally using MiniCPM-V 4.6 — a 1.3B multimodal model running via llama.cpp's llama-mtmd-cli, with no cloud API required. ## What's new **Tool (`vision`)** - Accepts images (JPEG, PNG, GIF, WebP, BMP) and videos (MP4, MOV, AVI, MKV, WebM) - Videos: ffprobe reads duration, ffmpeg extracts N evenly-spaced frames, all frames sent as a multi-image call to the model (configurable via `video_frames`, default 8) - Security: O_NOFOLLOW open (symlink protection), danger.CheckOperation classification, all output wrapped in wrapUntrusted() with provenance tag - Setup instructions in every error path (missing binary, missing model, missing mmproj, missing ffmpeg) **Docker (`docker/Dockerfile`)** - New `minicpm` multi-stage build: downloads pre-built llama-mtmd-cli (llama.cpp b9549) for amd64/arm64 from the official GitHub release, then fetches MiniCPM-V-4_6-Q4_K_M.gguf (529 MB) and mmproj-model-f16.gguf (1.1 GB) from HuggingFace into /usr/local/share/minicpm-v/models/ - Overridable via --build-arg MINICPM_QUANT=Q8_0 and LLAMA_VERSION - Runtime stage copies binary + models; no new runtime deps (libstdc++6 already present for whisper) **Config (`internal/config/loader.go`)** - New VisionConfig struct: ModelsDir, BinaryPath, VideoFrames - Wired into FileConfig, ResolvedConfig, resolveVision(), mergeFile() **Tests** - 13 tests in cmd/odek/vision_tool_test.go: empty path, invalid JSON, file not found, symlink rejected, missing binary, missing model, missing mmproj, mock happy-path image (4 extensions), custom prompt, mock happy-path video (with mock ffprobe+ffmpeg via PATH override), missing ffmpeg fallback, schema shape - 3 tests in internal/config/vision_test.go: resolveVision defaults, zero-frames backfill, custom values round-trip **Docs** - docs/CHEATSHEET.md: new Image & Video Understanding section with config snippet and field reference - docs/SECURITY.md: vision added to untrusted-content table, always-untrusted list, and skills provenance gate paragraph - docs/CONFIG.md + docs/TELEGRAM.md: smart-previews bullet updated - docker/README.md: new Image & video understanding (out of the box) section - README.md: vision added to external-content ingestion list Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-06-07T13:56:12Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	odek	`be67d47`	Commit Preview URL Branch Preview URL	Jun 07 2026, 02:03 PM

…on hash Replace `resolve/main/` with `resolve/<sha>/` (78e02f0) so Docker builds are reproducible — a future model update on the main branch won't silently change the binary image. vprotocol auto-repair: finding D001 (Axis 2.6 Dependency Integrity). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jkyberneees · 2026-06-07T14:03:02Z

vprotocol v5.2.7 — Verification Certificate

PR: feat/minicpm-v-vision-tool · head be67d47 · 21 files · +839/-19 LOC
Generator: Claude Sonnet 4.6 (claude-sonnet-4-6) · Class: GeneratedCode + GeneratedTests
Run date: 2026-06-07

Nine-Axis Results

Axis	Verdict	Notes
2.1 Semantic Correctness	✅	All 13 tests pass; error paths explicit
2.2 Behavioral Contract	⚠️	No formal spec — PR description is the contract
2.3 Security Surface	✅	`O_NOFOLLOW`, `danger.CheckOperation`, `wrapUntrusted`, `exec.Command` (no shell injection)
2.4 Structural Integrity	✅	Mirrors `transcribe_tool.go`; compile-time interface check
2.5 Behavioral Exploration	⚠️	Extension-only video detection; no image size guard (acceptable v1 risk)
2.6 Dependency Integrity	✅ (repaired)	`llama-mtmd-cli` pinned to `b9549`; HuggingFace URLs pinned to commit `78e02f0` (was `resolve/main/`)
2.7 Generator Provenance	⚠️	Code + tests: same model, same session → correlated blind spots possible
2.8 Adversarial Surface	✅	User inputs flow to `exec.Command` args; output wrapped as untrusted
2.9 Documentation Coverage	✅	CHEATSHEET, SECURITY, CONFIG, TELEGRAM, docker/README, README all updated

η Score

Signal	Weight	Value
m (mutation kill rate)	0.34	0.60 (mock-based tests; no mutation runner)
o (oracle agreement)	0.24	0.20 (no independent spec)
b (branch coverage)	0.14	0.75 (Docker auto-detect path untested)
f (fuzz survival)	0.09	1.00
s (SAST clean)	0.04	1.00 (`go vet: clean`)
t (static depth)	0.10	1.00
d (doc coverage)	0.05	1.00

η_raw = 0.637 · ρ_penalty = 0.255 (same-family code+tests) · η = 0.382

Verdict: `HumanReviewRequired`

η 0.382 < 0.80 threshold; ρ 0.255 exceeds 0.20 correlated-generator threshold.
ΔDebt = 0.53 h (Low). Ci_estimated: true.

Requires independent human review before merge.
Focus areas: Axis 2.7 (correlated tests), Axis 2.5 (extension-only video detection), Axis 2.2 (no formal spec).

Auto-Repairs Applied

D001 — Dependency Integrity (commit be67d47)
Pinned HuggingFace model download URLs from resolve/main/ to resolve/78e02f066e9819a60573b78a4275df8a0c27f698/ in both docker/Dockerfile and the manual install instructions in vision_tool.go. Reproducible builds now guaranteed regardless of future upstream model updates.

Generated by vprotocol v5.2.7 auto-repair mode

jkyberneees merged commit 33ec4a8 into main Jun 7, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vision): add vision tool using MiniCPM-V 4.6 (1.3B) via llama-mtmd-cli#22

feat(vision): add vision tool using MiniCPM-V 4.6 (1.3B) via llama-mtmd-cli#22
jkyberneees merged 2 commits into
mainfrom
feat/minicpm-v-vision-tool

jkyberneees commented Jun 7, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

jkyberneees commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jkyberneees commented Jun 7, 2026

Summary

What the tool does

Config

Docker build args

Test plan

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

jkyberneees commented Jun 7, 2026

vprotocol v5.2.7 — Verification Certificate

Nine-Axis Results

η Score

Verdict: HumanReviewRequired

Auto-Repairs Applied

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented Jun 7, 2026 •

edited

Loading

Verdict: `HumanReviewRequired`