VIGOR is a research-oriented framework for studying incremental code generation under a forward-only commit protocol. It is designed to make code generation behavior measurable, reproducible, and analyzable at the process level, not only at final pass/fail.
Conventional code generation pipelines typically produce full drafts, then post-hoc validate and repair. This limits process observability and weakens causal analysis of generation failures. VIGOR introduces a forward-only loop where each step proposes candidate next lines, applies structured gating, commits exactly one line to an immutable prefix, and records machine-readable traces. The framework provides configurable agents, policies, benchmark adapters, and evaluation/export tooling for research experiments. The repository currently implements the orchestration/experimentation stack with pluggable interfaces for production LLM agents and execution semantics.
We target three gaps in current code-generation experimentation:
- Low process visibility: final outputs hide intermediate decisions.
- Weak reproducibility: many systems do not expose deterministic decision traces.
- Difficult failure attribution: it is hard to pinpoint where generation diverged.
VIGOR addresses these with explicit step semantics, role-separated agents, deterministic policy hooks, and trace-first artifacts.
- one-shot or few-shot full completion
- late validation and repair
- rewriting/replacing prior drafts
- limited decision-level tracing
- one-step/one-line commit loop
- immutable committed prefix (forward-only)
- explicit agent pipeline (
Coder,Language,Reviewer,Evaluator,Commit,Tester, supervised orchestration) - machine-readable trace for each run
- policy-governed determinism (
Run Policy,Commit Policy)
At each step, VIGOR runs:
- Candidate generation (
Coder) - Parse-gating and classification (
Language+ authoritative parser) - Per-candidate gate decisions (
Reviewerfor fragments,Evaluatorfor executable lines) - Deterministic next-line selection (
Commit) - Prefix append and trace emission (supervised orchestration)
This repeats until policy-defined completion or termination.
Implemented:
- orchestration core (
generate_code(context, prompt)) insrc/vigor/codegen.py - config-driven experiment runner in
src/vigor/runner.py - dataset adapter layer in
src/vigor/datasets.py - evaluation/export harness in
src/vigor/evaluate.py - comprehensive tests (including difficult doctored agent cases and complex integration samples)
Pluggable / target-level components (not fully productionized here):
- production LLM-backed agents
- full incremental parser + full SpecExec/oracle semantics
- Python 3.11+
pip
python -m pip install -U pip
python -m pip install pyyaml pytestUse a local env file (gitignored):
OPENAI_API_KEY=YOUR_KEY
# Optional OpenAI-compatible provider key
OPENROUTER_API_KEY=YOUR_KEY
# Optional Gemini key for README image generation
GOOGLE_API_KEY=YOUR_KEYThe runner auto-loads local env files (if present):
.envlocl.env.env.local
Use configs/runner.example.yaml.
python -m src.vigor.runner --data-dir data --config configs/runner.example.yamlpython -m src.vigor.evaluate --run-dir runs/demo --mode status_onlypython -m src.vigor.evaluate \
--run-dir runs/demo \
--mode external \
--external-command "python path/to/evaluator.py --predictions {predictions} --out {out_dir}"humanevalhumanevalplusmbppplusappsds1000livecodebench_jsonlbigcodebench_jsonljsonl(generic)
See data/README.md for local dataset notes and format expectations.
Key config sections in configs/runner.example.yaml:
llm(shared model, provider/base URL, per-agent overrides)dataset(adapter kind + path)runner(task limits, resume behavior, filtering)output,trace_html,generation_logcontext(prompts, parser, zero-shot prefetch, policies, agent factories)
Prompt templates are configurable for all agents, with defaults.
run_manifest.json: config snapshot + hash metadataresults.jsonl: task-level generation outcomestrace.jsonl: machine-readable generation tracegeneration_logs/*.log: detailed per-task logstrace_html/*.html: optional per-task HTML process viewevaluation/*: exported predictions + evaluation summary
Run the full suite:
python -m pytest -q testsThe suite includes:
- core orchestration tests
- runner/config/output tests
- dataset adapter coverage across supported kinds
- agent doctored-case tests (including difficult Python statements)
- complex end-to-end generation scenarios
- Manifesto:
docs/vigor_manifest.md - System specification:
docs/system_specification.md - Paper drafting material:
paper/
This repo includes utilities to generate README figures with Gemini/Imagen:
python scripts/generate_hero_illustration_gemini.py
python scripts/generate_readme_image_gemini.py
python scripts/generate_project_diagram_hq.pyOutputs:
assets/readme/vigor_hero_illustration.svgassets/readme/vigor_hero_illustration.png(Gemini-generated non-technical hero image)assets/readme/vigor_hero_illustration.meta.jsonassets/readme/vigor_project_diagram_hq.svgassets/readme/vigor_project_diagram_hq.png(when Gemini generation succeeds)assets/readme/vigor_project_diagram_hq.meta.jsonassets/readme/vigor_research_overview.svgassets/readme/vigor_research_overview.png(when Gemini generation succeeds)assets/readme/vigor_research_overview.meta.jsonassets/readme/vigor_process_visual_abstract.svg(portable hand-authored process SVG for README/paper usage)
If GOOGLE_API_KEY / GEMINI_API_KEY is unavailable, the script writes a deterministic vector fallback and records the reason in metadata.
If you use VIGOR in your work, please cite the forthcoming paper/artifact for this repository.
