VIGOR: Validated Incremental code Generation with speculative ORacles

VIGOR is a research-oriented framework for studying incremental code generation under a forward-only commit protocol. It is designed to make code generation behavior measurable, reproducible, and analyzable at the process level, not only at final pass/fail.

Abstract

Conventional code generation pipelines typically produce full drafts, then post-hoc validate and repair. This limits process observability and weakens causal analysis of generation failures. VIGOR introduces a forward-only loop where each step proposes candidate next lines, applies structured gating, commits exactly one line to an immutable prefix, and records machine-readable traces. The framework provides configurable agents, policies, benchmark adapters, and evaluation/export tooling for research experiments. The repository currently implements the orchestration/experimentation stack with pluggable interfaces for production LLM agents and execution semantics.

Research Motivation

We target three gaps in current code-generation experimentation:

Low process visibility: final outputs hide intermediate decisions.
Weak reproducibility: many systems do not expose deterministic decision traces.
Difficult failure attribution: it is hard to pinpoint where generation diverged.

VIGOR addresses these with explicit step semantics, role-separated agents, deterministic policy hooks, and trace-first artifacts.

Core Idea and How It Differs from Conventional Generation

Conventional pattern

one-shot or few-shot full completion
late validation and repair
rewriting/replacing prior drafts
limited decision-level tracing

VIGOR pattern

one-step/one-line commit loop
immutable committed prefix (forward-only)
explicit agent pipeline (Coder, Language, Reviewer, Evaluator, Commit, Tester, supervised orchestration)
machine-readable trace for each run
policy-governed determinism (Run Policy, Commit Policy)

Method Summary

At each step, VIGOR runs:

Candidate generation (Coder)
Parse-gating and classification (Language + authoritative parser)
Per-candidate gate decisions (Reviewer for fragments, Evaluator for executable lines)
Deterministic next-line selection (Commit)
Prefix append and trace emission (supervised orchestration)

This repeats until policy-defined completion or termination.

Process Diagram

Repository Scope

Implemented:

orchestration core (generate_code(context, prompt)) in src/vigor/codegen.py
config-driven experiment runner in src/vigor/runner.py
dataset adapter layer in src/vigor/datasets.py
evaluation/export harness in src/vigor/evaluate.py
comprehensive tests (including difficult doctored agent cases and complex integration samples)

Pluggable / target-level components (not fully productionized here):

production LLM-backed agents
full incremental parser + full SpecExec/oracle semantics

Installation

Prerequisites

Python 3.11+
pip

Install

python -m pip install -U pip
python -m pip install pyyaml pytest

Local secrets

Use a local env file (gitignored):

OPENAI_API_KEY=YOUR_KEY
# Optional OpenAI-compatible provider key
OPENROUTER_API_KEY=YOUR_KEY
# Optional Gemini key for README image generation
GOOGLE_API_KEY=YOUR_KEY

The runner auto-loads local env files (if present):

.env
locl.env
.env.local

Running Experiments

1) Configure runner

Use configs/runner.example.yaml.

2) Execute generation

python -m src.vigor.runner --data-dir data --config configs/runner.example.yaml

3) Evaluate/export outputs

python -m src.vigor.evaluate --run-dir runs/demo --mode status_only

4) Optional external benchmark evaluator

python -m src.vigor.evaluate \
  --run-dir runs/demo \
  --mode external \
  --external-command "python path/to/evaluator.py --predictions {predictions} --out {out_dir}"

Supported Benchmarks (Adapter Kinds)

humaneval
humanevalplus
mbppplus
apps
ds1000
livecodebench_jsonl
bigcodebench_jsonl
jsonl (generic)

See data/README.md for local dataset notes and format expectations.

Configuration Highlights

Key config sections in configs/runner.example.yaml:

llm (shared model, provider/base URL, per-agent overrides)
dataset (adapter kind + path)
runner (task limits, resume behavior, filtering)
output, trace_html, generation_log
context (prompts, parser, zero-shot prefetch, policies, agent factories)

Prompt templates are configurable for all agents, with defaults.

Artifacts Produced per Run

run_manifest.json: config snapshot + hash metadata
results.jsonl: task-level generation outcomes
trace.jsonl: machine-readable generation trace
generation_logs/*.log: detailed per-task logs
trace_html/*.html: optional per-task HTML process view
evaluation/*: exported predictions + evaluation summary

Testing and Validation

Run the full suite:

python -m pytest -q tests

The suite includes:

core orchestration tests
runner/config/output tests
dataset adapter coverage across supported kinds
agent doctored-case tests (including difficult Python statements)
complex end-to-end generation scenarios

Research Documentation

Manifesto: docs/vigor_manifest.md
System specification: docs/system_specification.md
Paper drafting material: paper/

README Figure Generation (Gemini)

This repo includes utilities to generate README figures with Gemini/Imagen:

python scripts/generate_hero_illustration_gemini.py
python scripts/generate_readme_image_gemini.py
python scripts/generate_project_diagram_hq.py

Outputs:

assets/readme/vigor_hero_illustration.svg
assets/readme/vigor_hero_illustration.png (Gemini-generated non-technical hero image)
assets/readme/vigor_hero_illustration.meta.json
assets/readme/vigor_project_diagram_hq.svg
assets/readme/vigor_project_diagram_hq.png (when Gemini generation succeeds)
assets/readme/vigor_project_diagram_hq.meta.json
assets/readme/vigor_research_overview.svg
assets/readme/vigor_research_overview.png (when Gemini generation succeeds)
assets/readme/vigor_research_overview.meta.json
assets/readme/vigor_process_visual_abstract.svg (portable hand-authored process SVG for README/paper usage)

If GOOGLE_API_KEY / GEMINI_API_KEY is unavailable, the script writes a deterministic vector fallback and records the reason in metadata.

Citation (Placeholder)

If you use VIGOR in your work, please cite the forthcoming paper/artifact for this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VIGOR: Validated Incremental code Generation with speculative ORacles

Abstract

Research Motivation

Core Idea and How It Differs from Conventional Generation

Conventional pattern

VIGOR pattern

Method Summary

Process Diagram

Repository Scope

Installation

Prerequisites

Install

Local secrets

Running Experiments

1) Configure runner

2) Execute generation

3) Evaluate/export outputs

4) Optional external benchmark evaluator

Supported Benchmarks (Adapter Kinds)

Configuration Highlights

Artifacts Produced per Run

Testing and Validation

Research Documentation

README Figure Generation (Gemini)

Citation (Placeholder)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets/readme		assets/readme
configs		configs
data		data
docs		docs
paper		paper
scripts		scripts
src/vigor		src/vigor
tests		tests
.gitignore		.gitignore
README.md		README.md
pytest.ini		pytest.ini

Folders and files

Latest commit

History

Repository files navigation

VIGOR: Validated Incremental code Generation with speculative ORacles

Abstract

Research Motivation

Core Idea and How It Differs from Conventional Generation

Conventional pattern

VIGOR pattern

Method Summary

Process Diagram

Repository Scope

Installation

Prerequisites

Install

Local secrets

Running Experiments

1) Configure runner

2) Execute generation

3) Evaluate/export outputs

4) Optional external benchmark evaluator

Supported Benchmarks (Adapter Kinds)

Configuration Highlights

Artifacts Produced per Run

Testing and Validation

Research Documentation

README Figure Generation (Gemini)

Citation (Placeholder)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages