Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified packages/analysis/src/group/cross-repo-links.ts
Binary file not shown.
2 changes: 1 addition & 1 deletion packages/analysis/src/test-utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
* `traverseDescendants`, `traverse`, plus the ITemporalStore-compat noops.
*
* Per-test fixtures populate the store via `addNode` / `addEdge`; the test
* then exercises the production code through the same finders the DuckDb
* then exercises the production code through the same finders the SQLite store
* and GraphDb adapters expose. No raw SQL crosses the test boundary.
*/

Expand Down
2 changes: 1 addition & 1 deletion packages/analysis/src/verdict.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
* Contributors) never crashes the verdict; it simply drops the missing
* signal.
* - **Zero `any`**: the only loose type surface is `Record<string,unknown>`
* for raw DuckDB rows, each of which we narrow with explicit casts.
* for raw SQLite rows, each of which we narrow with explicit casts.
*/

import { execFile } from "node:child_process";
Expand Down
29 changes: 14 additions & 15 deletions packages/cli/src/commands/analyze.ts
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ export interface AnalyzeOptions {
readonly force?: boolean;
/**
* When true, the embeddings phase embeds every callable/declaration symbol
* and the result is upserted into the DuckDB `embeddings` table. Requires
* and the result is upserted into the `embeddings` table. Requires
* `codehub setup --embeddings` to have installed weights; if weights are
* missing the phase logs a warning and skips — analyze never aborts.
*/
Expand Down Expand Up @@ -222,7 +222,7 @@ export async function runAnalyze(path: string, opts: AnalyzeOptions = {}): Promi

// Load a prior graph projection for the incremental-scope phase when the
// CLI was not invoked with --force. The projection is a thin wrapper
// around the prior DuckDB index (File nodes + IMPORTS / EXTENDS /
// around the prior SQLite index (File nodes + IMPORTS / EXTENDS /
// IMPLEMENTS edges). `loadPreviousGraph` silently returns undefined if
// the store does not exist or cannot be opened; incremental-scope then
// reports mode="full" with reason="no-prior-graph".
Expand Down Expand Up @@ -333,10 +333,9 @@ export async function runAnalyze(path: string, opts: AnalyzeOptions = {}): Promi
);
}

// Persist to the composed graph + temporal store. Storage is always
// graph.lbug (graph-tier) + temporal.duckdb sidecar (cochanges, summary
// cache); the temporal-tier writes (`bulkLoadCochanges`,
// `bulkLoadSymbolSummaries`) route through `store.temporal`.
// Persist to the composed graph + temporal store. Post-ADR 0019 both views
// are one `store.sqlite`; the temporal-tier writes (`bulkLoadCochanges`,
// `bulkLoadSymbolSummaries`) still route through `store.temporal`.
await mkdir(resolveRepoMetaDir(repoPath), { recursive: true });
const dbPath = resolveGraphPath(repoPath);
const store: Store = await openStore({ path: dbPath });
Expand Down Expand Up @@ -423,7 +422,7 @@ export async function runAnalyze(path: string, opts: AnalyzeOptions = {}): Promi

// Persist the scan-state sidecar so the next analyze invocation can feed
// the incremental-scope phase via loadPreviousGraph(). We write this
// alongside the DuckDB file under `<repo>/.codehub` so a clean of the
// alongside the store.sqlite file under `<repo>/.codehub` so a clean of the
// meta dir invalidates both the index and the incremental state together.
if (result.scan !== undefined) {
await writeScanState(
Expand All @@ -434,7 +433,7 @@ export async function runAnalyze(path: string, opts: AnalyzeOptions = {}): Promi

// Opt-in skill generation. Walk Community nodes just persisted above and
// emit one SKILL.md per cluster under `<repo>/.codehub/skills/`. Runs
// against the still-open DuckDB handle so there's no re-open cost, and
// against the still-open SQLite handle so there's no re-open cost, and
// any per-skill failure (read-only dir, permission denied, disk full)
// logs-and-continues — analyze never aborts because of a skill write.
if (opts.skills === true) {
Expand Down Expand Up @@ -579,7 +578,7 @@ export async function runAnalyze(path: string, opts: AnalyzeOptions = {}): Promi

/**
* Build the {@link pipeline.PreviousGraph} projection expected by the
* incremental-scope phase from the prior DuckDB index + scan-state sidecar.
* incremental-scope phase from the prior SQLite index + scan-state sidecar.
*
* The projection carries:
* - file paths + scan-time content hashes, read from
Expand Down Expand Up @@ -827,7 +826,7 @@ export async function resolveCoverageEnabled(
* compute that before the pipeline runs (LSP phases haven't yielded
* yet), so we use the prior run's stored counts when available:
*
* - If a DuckDB store is readable at the expected path, count nodes
* - If a SQLite store is readable at the expected path, count nodes
* whose kind is Function/Method/Class. That count is the best proxy
* for "SCIP-confirmed callables" we can get before the parse phase.
* - If no prior store exists (fresh clone, first analyze), fall back
Expand Down Expand Up @@ -863,7 +862,7 @@ export async function resolveMaxSummariesCap(

/**
* Count callable symbols (Function / Method / Class) recorded by the
* prior run. Returns `undefined` when no prior DuckDB index exists or
* prior run. Returns `undefined` when no prior SQLite index exists or
* the count query fails — callers treat that as "no prior run" and fall
* back to the first-run heuristic.
*/
Expand Down Expand Up @@ -893,7 +892,7 @@ async function countPriorCallableSymbols(repoPath: string): Promise<number | und
}

/**
* Open a read-only DuckDB store scoped to the `symbol_summaries` cache
* Open a read-only SQLite store scoped to the `symbol_summaries` cache
* probe. The returned object carries a cache adapter the `summarize`
* phase uses to short-circuit candidates whose content hash already has
* a row on disk, plus a `close()` the caller invokes to release the
Expand Down Expand Up @@ -927,7 +926,7 @@ async function openSummaryCacheAdapter(
}

/**
* Open a read-only DuckDB store scoped to the `embeddings` content-hash
* Open a read-only SQLite store scoped to the `embeddings` content-hash
* probe. The returned adapter's `list()` loads every prior
* `(granularity, nodeId, chunkIndex) → content_hash` row in a single
* round-trip so the embeddings phase can skip chunks whose source text is
Expand Down Expand Up @@ -992,7 +991,7 @@ function fileFromNodeId(id: string): string | undefined {
// returns rehydrated `GraphNode` objects, so the constant is no longer
// load-bearing here. The `rowToGraphNode` / `rowToCodeRelation` adapters
// below remain exported for external consumers that hand-roll over the
// DuckDB wide-column shape.
// SQLite wide-column shape.

const NODE_KIND_SET: ReadonlySet<string> = new Set<string>(NODE_KINDS);
const RELATION_TYPE_SET: ReadonlySet<string> = new Set<string>(RELATION_TYPES);
Expand All @@ -1015,7 +1014,7 @@ function boolField(r: Record<string, unknown>, col: string): boolean | undefined
}

function stringArrayField(r: Record<string, unknown>, col: string): readonly string[] | undefined {
// Preserve `[]` distinct from absent. The DuckDB TEXT[] binder returns
// Preserve `[]` distinct from absent. The SQLite TEXT[] binder returns
// a 0-length JS array for an empty SQL array literal and `null` for
// SQL NULL; mirror the storage adapter's `setStringArrayField` and
// return the array verbatim so a Community / Route node written as
Expand Down
2 changes: 1 addition & 1 deletion packages/cli/src/commands/code-pack.ts
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ export interface CodePackArgs {
readonly engine?: "pack" | "repomix";
/**
* Test seam — inject a custom `generatePack` so unit tests don't need
* to load native DuckDB bindings. Production callers leave this
* to load native storage bindings. Production callers leave this
* unset.
*/
readonly _generatePack?: typeof generatePack;
Expand Down
6 changes: 3 additions & 3 deletions packages/cli/src/commands/ingest-sarif.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
* Flow:
* 1. Read + parse + validate the SARIF file via `@opencodehub/sarif`.
* 2. Resolve the target repo (either `--repo <name>` or CWD).
* 3. Open the DuckDB store and pull a per-file, line-sorted symbol
* 3. Open the SQLite store and pull a per-file, line-sorted symbol
* index over the SARIF's referenced URIs (used to resolve Finding
* → Symbol edges).
* 4. For every Result across every Run, build a Finding node keyed by
Expand All @@ -15,7 +15,7 @@
* enclosing symbol at `(uri, startLine)` when the graph contains
* one. A scanner-provided `opencodehub.symbolId` hint wins over the
* enclosing lookup when set.
* 5. UPSERT into DuckDB via `store.bulkLoad({ mode: "upsert" })`.
* 5. UPSERT into the SQLite store via `store.bulkLoad({ mode: "upsert" })`.
*
* The command is idempotent — re-running with the same SARIF produces
* the same nodes and edges. Results without a parsable location (no
Expand Down Expand Up @@ -140,7 +140,7 @@ interface BuildSummary {

/**
* Pure builder over SARIF runs. Exposed for unit tests so we can exercise
* the node/edge emission logic without touching DuckDB.
* the node/edge emission logic without touching SQLite.
*
* `nodesByFile` is the per-file, line-sorted symbol index (produced by
* {@link indexNodesByFile}) used to resolve each SARIF result back to the
Expand Down
4 changes: 2 additions & 2 deletions packages/cli/src/commands/list.ts
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ type Health = "ok" | "path-missing" | "graph-missing";

function classifyHealth(entry: RepoEntry): Health {
if (!existsSync(entry.path)) return "path-missing";
// Indexed probe: presence of `meta.json` / `graph.lbug` under `.codehub/`
// counts as "indexed" (lbug is the only graph backend post-ADR 0016).
// Indexed probe: presence of `meta.json` / `store.sqlite` under `.codehub/`
// counts as "indexed" (the single-file store is the only backend, ADR 0019).
if (!codehubIsIndexed(entry.path)) return "graph-missing";
return "ok";
}
Expand Down
8 changes: 4 additions & 4 deletions packages/cli/src/commands/open-store.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
* Returns the canonical {@link Store} envelope from `@opencodehub/storage`
* so callers can route graph-tier queries through `store.graph` and
* temporal-tier queries (cochanges, summaries, `--sql` escape hatch)
* through `store.temporal`. Storage is always graph.lbug + temporal.duckdb;
* the legacy backend selector was removed when the DuckDB graph backend
* was ripped out (see ADR 0016).
* through `store.temporal`. Post-ADR 0019 both views are one `SqliteStore`
* over a single `<repo>/.codehub/store.sqlite`; the legacy backend selector
* was removed when the lbug + DuckDB pair was replaced (see ADR 0019).
*/

import { resolve } from "node:path";
Expand All @@ -33,7 +33,7 @@ export async function openStoreForCommand(opts: OpenStoreOptions): Promise<OpenS
path: dbPath,
readOnly: opts.readOnly ?? true,
});
// The legacy CLI entry point opened the DuckDB connection eagerly and
// The legacy CLI entry point opened the store connection eagerly and
// every command consumed an already-open store. The `openStore` factory
// only constructs adapters; opening is the lifecycle owner's job. Keep
// that contract by opening both views here so command handlers stay a
Expand Down
2 changes: 1 addition & 1 deletion packages/cli/src/commands/query.ts
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ const INCLUDE_CONTENT_CHAR_CAP = 2000;
const SUMMARY_COLUMN_CHAR_CAP = 120;

/**
* Hook for tests to inject a pre-built store without touching DuckDB. The
* Hook for tests to inject a pre-built store without touching SQLite. The
* default implementation delegates to {@link openStoreForCommand}. Kept
* separate from the public `QueryOptions` interface so end-user CLI callers
* aren't tempted to pass an in-process store.
Expand Down
2 changes: 1 addition & 1 deletion packages/cli/src/commands/status.ts
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ export interface StatusOptions {
/**
* Test seam: open a read-only store and return its retrieval state. Defaults
* to opening the real composed store. Tests inject a stub so they don't need
* a live graph.lbug on disk.
* a live store.sqlite on disk.
*/
readonly probeRetrieval?: (repoPath: string) => Promise<RetrievalState | undefined>;
}
Expand Down
2 changes: 1 addition & 1 deletion packages/cli/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
* `codehub` CLI entrypoint.
*
* Every subcommand is loaded lazily via `await import(...)` so that
* `codehub --help` (and `codehub <command> --help`) stays fast: no DuckDB
* `codehub --help` (and `codehub <command> --help`) stays fast: no native storage engine
* native binding, no pipeline, no MCP SDK unless we are actually going to
* run that subcommand.
*/
Expand Down
2 changes: 1 addition & 1 deletion packages/cli/src/lib/is-indexed.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
* either signal exists under `<repoPath>/.codehub`:
*
* - `meta.json` — written by every successful analyze run.
* - `graph.lbug` — the lbug graph artifact (post-M7 the only graph backend).
* - `store.sqlite` — the single-file index (ADR 0019; the only backend).
*
* Returns a plain boolean — UI surfaces (e.g. `codehub list`) want a single
* column rendering.
Expand Down
2 changes: 1 addition & 1 deletion packages/cli/src/skills-gen.ts
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ async function fetchProcessEntryPointIds(store: SkillsGenStore): Promise<Readonl
* Fetch the top-K members of a community by outgoing CALLS degree. Used as a
* fallback when no community members are process heads. Computes the
* `GROUP BY from_id COUNT(*)` aggregate in TS over the typed-finder edges
* — the legacy SQL pushed it down to DuckDB, but `listEdgesByType` already
* — the legacy SQL pushed it down to SQLite, but `listEdgesByType` already
* narrows to one type so the reduction is bounded by community size.
*/
async function fetchTopCallersByOutDegree(
Expand Down
21 changes: 10 additions & 11 deletions packages/core-types/src/graph-hash.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,17 +23,16 @@ import { writeCanonicalJson } from "./hash.js";
* `{"keywords":[]}` in the canonical JSON projection, while the same node
* with the `keywords` key absent emits no key at all — the two
* canonical-JSON byte streams differ, so their SHA-256 graph hashes
* differ. Storage adapters preserve this distinction at the writer +
* reader boundary: see
* `packages/storage/src/column-encode.ts:stringArrayOrNull`,
* `packages/storage/src/duckdb-adapter.ts:setStringArrayField`,
* `packages/storage/src/graphdb-adapter.ts:setStringArrayFieldGd`, and
* `packages/cli/src/commands/analyze.ts:stringArrayField`. The contract
* is exercised end-to-end by the
* `graphHash parity: medium-with-empty-keywords` fixture in
* `packages/storage/src/graph-hash-parity.test.ts`, which asserts both
* (a) cross-adapter parity for `{keywords: []}` and (b) the resulting
* hash differs from the equivalent fixture without the `keywords` key.
* differ. The single-file `SqliteStore` (ADR 0019) preserves this
* distinction by folding `keywords` into the canonical-JSON `payload`
* column, so `canonicalJson` over `payload` carries `[]`-vs-absent
* verbatim — see `packages/storage/src/sqlite-adapter.ts`. The CLI's
* read-back mirrors it at
* `packages/cli/src/commands/analyze.ts:stringArrayField`. The contract is
* exercised end-to-end by the
* `graphHash parity: medium fixture (mixed kinds + sentinels)` test in
* `packages/storage/src/sqlite-parity.test.ts`, which round-trips the
* `{keywords: []}` sentinel and asserts the rebuilt graph hashes identically.
*
* The same `[]`-vs-absent semantics apply to `responseKeys` on RouteNode.
* Empty `Record<string, number>` (`languageStats: {}`) goes through a
Expand Down
6 changes: 3 additions & 3 deletions packages/ingestion/src/pipeline/orchestrator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
* configured phase set, and returns a summary plus the hashed graph.
*
* The orchestrator does not touch storage — the returned
* `KnowledgeGraph` is in-memory only. Persisting it (DuckDB / embeddings)
* `KnowledgeGraph` is in-memory only. Persisting it (SQLite / embeddings)
* is a CLI concern (see `codehub analyze`, which opens a writable store
* and calls `bulkLoad`).
*/
Expand Down Expand Up @@ -133,7 +133,7 @@ export interface RunIngestionOptions extends PipelineOptions {
readonly onProgress?: (ev: ProgressEvent) => void;
/**
* Optional adapter the summarize phase probes before issuing work.
* Production wires this to the DuckDB store's `lookupSymbolSummary`
* Production wires this to the SQLite store's `lookupSymbolSummary`
* implementation so re-indexes become free when source hasn't drifted.
* Tests inject an in-memory fake. Absent by default — the phase degrades
* to "every candidate is a miss" which is still correct, just more
Expand All @@ -142,7 +142,7 @@ export interface RunIngestionOptions extends PipelineOptions {
readonly summaryCacheAdapter?: SummaryCacheAdapter;
/**
* Optional adapter the embeddings phase probes before issuing embedder
* calls. Production wires this to the DuckDB store's
* calls. Production wires this to the SQLite store's
* `listEmbeddingHashes` implementation so re-analyze runs skip chunks
* whose `content_hash` matches a prior row. Absent by default —
* the phase degrades to "every chunk is new" which is still correct,
Expand Down
Binary file modified packages/ingestion/src/pipeline/phases/cochange.ts
Binary file not shown.
2 changes: 1 addition & 1 deletion packages/ingestion/src/pipeline/phases/embeddings.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/**
* Embeddings phase — generates 768-dim vectors across one or more
* hierarchical tiers and materialises them into the phase output as an
* array of `EmbeddingRow`s the CLI upserts into DuckDB.
* array of `EmbeddingRow`s the CLI upserts into the SQLite store.
*
* Granularity tiers (P03):
* - `"symbol"` — one vector per callable/declaration symbol. When a
Expand Down
2 changes: 1 addition & 1 deletion packages/ingestion/src/pipeline/phases/summarize.ts
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ async function runSummarize(ctx: PipelineContext): Promise<SummarizePhaseOutput>

// Resolve a cache adapter from the options bag if the CLI attached one.
// Phases have no direct store handle, so we route cache probes through a
// narrow hook on `ctx.options`. Production attaches the DuckDB-backed
// narrow hook on `ctx.options`. Production attaches the SQLite-backed
// adapter; tests supply an in-memory fake.
const cacheAdapter = resolveCacheAdapter(ctx);

Expand Down
2 changes: 1 addition & 1 deletion packages/ingestion/src/pipeline/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ export interface PipelineContext {
* Minimal projection of a prior-run graph sufficient for the incremental-scope
* phase to compute the import-closure walk. We intentionally keep this
* narrower than a full {@link KnowledgeGraph} so callers can materialise
* it cheaply from persisted storage (DuckDB rows, sidecar JSON, etc.) without
* it cheaply from persisted storage (SQLite rows, sidecar JSON, etc.) without
* hydrating every node/edge kind in the graph.
*
* All arrays carry repo-relative posix paths (matching `ScannedFile.relPath`).
Expand Down
2 changes: 1 addition & 1 deletion packages/ingestion/src/providers/resolution/context.ts
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ export interface ResolutionCandidate {

/**
* Minimal symbol-lookup surface. Concrete implementations sit atop the
* DuckDB-backed `IGraphStore`, but every resolver strategy speaks to this
* SQLite-backed `IGraphStore`, but every resolver strategy speaks to this
* interface so unit tests can drive it with in-memory fixtures.
*/
export interface SymbolIndex {
Expand Down
Loading
Loading