Skip to content

ApartsinProjects/SpeculativeCodeGeneration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VIGOR: Validated Incremental code Generation with speculative ORacles

VIGOR hero illustration

VIGOR is a research-oriented framework for studying incremental code generation under a forward-only commit protocol. It is designed to make code generation behavior measurable, reproducible, and analyzable at the process level, not only at final pass/fail.

Abstract

Conventional code generation pipelines typically produce full drafts, then post-hoc validate and repair. This limits process observability and weakens causal analysis of generation failures. VIGOR introduces a forward-only loop where each step proposes candidate next lines, applies structured gating, commits exactly one line to an immutable prefix, and records machine-readable traces. The framework provides configurable agents, policies, benchmark adapters, and evaluation/export tooling for research experiments. The repository currently implements the orchestration/experimentation stack with pluggable interfaces for production LLM agents and execution semantics.

Research Motivation

We target three gaps in current code-generation experimentation:

  1. Low process visibility: final outputs hide intermediate decisions.
  2. Weak reproducibility: many systems do not expose deterministic decision traces.
  3. Difficult failure attribution: it is hard to pinpoint where generation diverged.

VIGOR addresses these with explicit step semantics, role-separated agents, deterministic policy hooks, and trace-first artifacts.

Core Idea and How It Differs from Conventional Generation

Conventional pattern

  • one-shot or few-shot full completion
  • late validation and repair
  • rewriting/replacing prior drafts
  • limited decision-level tracing

VIGOR pattern

  • one-step/one-line commit loop
  • immutable committed prefix (forward-only)
  • explicit agent pipeline (Coder, Language, Reviewer, Evaluator, Commit, Tester, supervised orchestration)
  • machine-readable trace for each run
  • policy-governed determinism (Run Policy, Commit Policy)

Method Summary

At each step, VIGOR runs:

  1. Candidate generation (Coder)
  2. Parse-gating and classification (Language + authoritative parser)
  3. Per-candidate gate decisions (Reviewer for fragments, Evaluator for executable lines)
  4. Deterministic next-line selection (Commit)
  5. Prefix append and trace emission (supervised orchestration)

This repeats until policy-defined completion or termination.

Process Diagram

VIGOR process visual abstract

Repository Scope

Implemented:

  • orchestration core (generate_code(context, prompt)) in src/vigor/codegen.py
  • config-driven experiment runner in src/vigor/runner.py
  • dataset adapter layer in src/vigor/datasets.py
  • evaluation/export harness in src/vigor/evaluate.py
  • comprehensive tests (including difficult doctored agent cases and complex integration samples)

Pluggable / target-level components (not fully productionized here):

  • production LLM-backed agents
  • full incremental parser + full SpecExec/oracle semantics

Installation

Prerequisites

  • Python 3.11+
  • pip

Install

python -m pip install -U pip
python -m pip install pyyaml pytest

Local secrets

Use a local env file (gitignored):

OPENAI_API_KEY=YOUR_KEY
# Optional OpenAI-compatible provider key
OPENROUTER_API_KEY=YOUR_KEY
# Optional Gemini key for README image generation
GOOGLE_API_KEY=YOUR_KEY

The runner auto-loads local env files (if present):

  • .env
  • locl.env
  • .env.local

Running Experiments

1) Configure runner

Use configs/runner.example.yaml.

2) Execute generation

python -m src.vigor.runner --data-dir data --config configs/runner.example.yaml

3) Evaluate/export outputs

python -m src.vigor.evaluate --run-dir runs/demo --mode status_only

4) Optional external benchmark evaluator

python -m src.vigor.evaluate \
  --run-dir runs/demo \
  --mode external \
  --external-command "python path/to/evaluator.py --predictions {predictions} --out {out_dir}"

Supported Benchmarks (Adapter Kinds)

  • humaneval
  • humanevalplus
  • mbppplus
  • apps
  • ds1000
  • livecodebench_jsonl
  • bigcodebench_jsonl
  • jsonl (generic)

See data/README.md for local dataset notes and format expectations.

Configuration Highlights

Key config sections in configs/runner.example.yaml:

  • llm (shared model, provider/base URL, per-agent overrides)
  • dataset (adapter kind + path)
  • runner (task limits, resume behavior, filtering)
  • output, trace_html, generation_log
  • context (prompts, parser, zero-shot prefetch, policies, agent factories)

Prompt templates are configurable for all agents, with defaults.

Artifacts Produced per Run

  • run_manifest.json: config snapshot + hash metadata
  • results.jsonl: task-level generation outcomes
  • trace.jsonl: machine-readable generation trace
  • generation_logs/*.log: detailed per-task logs
  • trace_html/*.html: optional per-task HTML process view
  • evaluation/*: exported predictions + evaluation summary

Testing and Validation

Run the full suite:

python -m pytest -q tests

The suite includes:

  • core orchestration tests
  • runner/config/output tests
  • dataset adapter coverage across supported kinds
  • agent doctored-case tests (including difficult Python statements)
  • complex end-to-end generation scenarios

Research Documentation

  • Manifesto: docs/vigor_manifest.md
  • System specification: docs/system_specification.md
  • Paper drafting material: paper/

README Figure Generation (Gemini)

This repo includes utilities to generate README figures with Gemini/Imagen:

python scripts/generate_hero_illustration_gemini.py
python scripts/generate_readme_image_gemini.py
python scripts/generate_project_diagram_hq.py

Outputs:

  • assets/readme/vigor_hero_illustration.svg
  • assets/readme/vigor_hero_illustration.png (Gemini-generated non-technical hero image)
  • assets/readme/vigor_hero_illustration.meta.json
  • assets/readme/vigor_project_diagram_hq.svg
  • assets/readme/vigor_project_diagram_hq.png (when Gemini generation succeeds)
  • assets/readme/vigor_project_diagram_hq.meta.json
  • assets/readme/vigor_research_overview.svg
  • assets/readme/vigor_research_overview.png (when Gemini generation succeeds)
  • assets/readme/vigor_research_overview.meta.json
  • assets/readme/vigor_process_visual_abstract.svg (portable hand-authored process SVG for README/paper usage)

If GOOGLE_API_KEY / GEMINI_API_KEY is unavailable, the script writes a deterministic vector fallback and records the reason in metadata.

Citation (Placeholder)

If you use VIGOR in your work, please cite the forthcoming paper/artifact for this repository.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages