From ea8d2e2043dc2cf3167d5d5921ebcde1d968eb1d Mon Sep 17 00:00:00 2001
From: Laith Al-Saadoon <alsaadoonlaith@gmail.com>
Date: Fri, 26 Jun 2026 01:40:59 +0000
Subject: [PATCH 1/2] =?UTF-8?q?feat(embedder)!:=20swap=20embedding=20model?=
 =?UTF-8?q?=20gte-modernbert-base=20=E2=86=92=20F2LLM-v2-80M=20(320-dim)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

BREAKING CHANGE: the local ONNX embedder is now codefuse-ai/F2LLM-v2-80M
(320-dim, was gte-modernbert-base 768-dim). Existing indexes MUST be
rebuilt with `codehub analyze --embeddings` — 320-dim query vectors cannot
be compared against stored 768-dim vectors. The embedder-fingerprint guard
(ADR 0014) refuses queries against a stale store until re-analyze.

What changed:
- onnx-embedder.ts requests the in-graph `embedding` output (shape [B,320],
  last-token pooling + L2 norm baked into the graph) instead of pooling /
  normalizing in JS — clsPool + l2NormalizeInPlace are removed. Qwen2 pad id.
- New Embedder.embedQuery() applies F2LLM's query-only `Instruct:` prefix;
  documents embed raw. Wired at the search/hybrid.ts query seam. New
  query-prefix.ts holds the instruction string.
- Dimension parameterized 768→320 across embedder, search (NullEmbedder),
  storage (DEFAULT_DIM), ingestion pool, and HTTP/SageMaker defaults.
- model-pins.ts: GTE_MODERNBERT_BASE_PINS → F2LLM_V2_80M_PINS; weights are a
  custom ONNX export hosted as the GitHub release asset `embed-v1`,
  SHA256-pinned. 3-file manifest (model + tokenizer.json + tokenizer_config.json).
- Migration guard: analyze suppresses the content-hash cache on a model-id
  change so a swap forces a full re-embed (no mixed-dimension store).
- Docs, CHANGELOGs, skills swept to F2LLM/320-dim. Collapsed the stale
  storage ADR chain (0001/0011/0013/0016 → 0019) and fixed the README's
  pre-ADR-0019 storage narrative. Fixed lefthook verdict guard to check
  store.sqlite (was the removed graph.lbug).

Verified: pnpm -r build + test green (~2400 tests); tokenizer parity with
Python AutoTokenizer (byte-identical IDs); production embedder reproduces the
POC 4/4 top-1 ranking + byte-deterministic output; end-to-end analyze writes
320-dim rows; migration guard confirmed firing on a stale stamp.
---
 .claude/skills/opencodehub-guide/SKILL.md     |   2 +-
 README.md                                     | 102 +++++------
 SPECS.md                                      |   9 +-
 docs/adr/0001-storage-backend.md              |  13 +-
 docs/adr/0011-graph-db-backend.md             |  13 +-
 .../0013-m7-default-flip-and-abstraction.md   |  14 +-
 ...cip-references-and-embedder-fingerprint.md |  13 +-
 docs/adr/0016-duckdb-graph-rip.md             |  12 +-
 lefthook.yml                                  |   5 +-
 packages/cli/src/commands/analyze.test.ts     |   4 +-
 packages/cli/src/commands/analyze.ts          |  34 +++-
 packages/cli/src/commands/doctor.test.ts      |   6 +-
 packages/cli/src/commands/doctor.ts           |   4 +-
 packages/cli/src/commands/query.test.ts       |  11 +-
 packages/cli/src/commands/query.ts            |   4 +-
 .../cli/src/commands/setup-embeddings.test.ts |   8 +-
 packages/cli/src/commands/setup.ts            |   9 +-
 packages/cli/src/embedder-downloader.test.ts  |   8 +-
 packages/cli/src/embedder-downloader.ts       |  10 +-
 packages/cli/src/index.ts                     |   6 +-
 .../content/docs/architecture/embeddings.md   |  24 ++-
 .../content/docs/architecture/monorepo-map.md |   2 +-
 .../docs/src/content/docs/reference/cli.md    |   4 +-
 .../content/docs/reference/configuration.md   |   4 +-
 packages/embedder/CHANGELOG.md                |  14 ++
 packages/embedder/README.md                   |  15 +-
 packages/embedder/package.json                |   6 +-
 packages/embedder/src/factory.test.ts         |  13 +-
 packages/embedder/src/factory.ts              |   2 +-
 packages/embedder/src/fingerprint.test.ts     |  20 +--
 packages/embedder/src/fingerprint.ts          |   6 +-
 packages/embedder/src/http-embedder.test.ts   |  58 +++----
 packages/embedder/src/http-embedder.ts        |  11 +-
 packages/embedder/src/index.ts                |   9 +-
 packages/embedder/src/model-pins.test.ts      |  72 +++++---
 packages/embedder/src/model-pins.ts           |  91 +++++-----
 packages/embedder/src/onnx-embedder.test.ts   |  47 +++--
 packages/embedder/src/onnx-embedder.ts        | 163 ++++++++----------
 packages/embedder/src/paths.test.ts           |  13 +-
 packages/embedder/src/paths.ts                |  24 ++-
 packages/embedder/src/query-prefix.ts         |  25 +++
 .../sagemaker-embedder.integration.test.ts    |  14 +-
 .../src/sagemaker-embedder.parity.test.ts     |   9 +-
 .../embedder/src/sagemaker-embedder.test.ts   |  36 ++--
 packages/embedder/src/sagemaker-embedder.ts   |  21 ++-
 packages/embedder/src/types.ts                |  36 ++--
 packages/ingestion/CHANGELOG.md               |   7 +
 .../src/pipeline/phases/embedder-pool.ts      |   9 +-
 .../src/pipeline/phases/embeddings.test.ts    |   2 +-
 .../src/pipeline/phases/embeddings.ts         |   9 +-
 packages/mcp/CHANGELOG.md                     |   7 +
 packages/mcp/src/server.ts                    |   4 +-
 packages/mcp/src/tools/query.test.ts          |   9 +-
 packages/mcp/src/tools/query.ts               |   7 +-
 packages/mcp/src/tools/shared.ts              |   2 +-
 packages/search/CHANGELOG.md                  |   7 +
 packages/search/src/embedder.ts               |   7 +-
 packages/search/src/hybrid.test.ts            |   5 +
 packages/search/src/hybrid.ts                 |   4 +-
 packages/search/src/types.ts                  |   6 +
 packages/storage/CHANGELOG.md                 |   7 +
 packages/storage/src/sqlite-adapter.test.ts   |   4 +-
 packages/storage/src/sqlite-adapter.ts        |   4 +-
 .../opencodehub/skills/codehub-guide/SKILL.md |   2 +-
 64 files changed, 680 insertions(+), 448 deletions(-)
 create mode 100644 packages/embedder/src/query-prefix.ts

diff --git a/.claude/skills/opencodehub-guide/SKILL.md b/.claude/skills/opencodehub-guide/SKILL.md
index ddbdc71b..29803824 100644
--- a/.claude/skills/opencodehub-guide/SKILL.md
+++ b/.claude/skills/opencodehub-guide/SKILL.md
@@ -15,7 +15,7 @@ For any task that touches code understanding, debugging, impact analysis, refact
 2. Read `codehub://repo/{name}/context` — codebase stats and a staleness envelope.
 3. Match the task to a skill below and follow that skill's checklist.
 
-> If the context envelope reports the index is stale, run `codehub analyze` in the terminal first. If it says weights are missing, run `codehub setup --embeddings` to fetch the 768d gte-modernbert-base ONNX weights.
+> If the context envelope reports the index is stale, run `codehub analyze` in the terminal first. If it says weights are missing, run `codehub setup --embeddings` to fetch the 320d F2LLM-v2-80M ONNX weights.
 
 ## Skills · analysis
 
diff --git a/README.md b/README.md
index d14037bd..d3544665 100644
--- a/README.md
+++ b/README.md
@@ -78,31 +78,31 @@ flowchart LR
 | **Local-first, offline-capable** | `codehub analyze --offline` opens zero sockets. Your code never leaves your machine. No telemetry. |
 | **Deterministic indexing** | Identical inputs produce a byte-identical graph hash. Reproducible. Auditable. Cacheable in CI. |
 | **MCP-native** | Works out-of-the-box with Claude Code, Cursor, Codex, Windsurf, OpenCode. The MCP server is the primary interface; CLI exists for scripts and CI. |
-| **Embedded storage, two-tier** | `@ladybugdb/core` holds the structural store: symbols, edges, embeddings, BM25 + HNSW. A dedicated DuckDB sibling holds the temporal views: cochanges and summaries. Embedded files. No daemon. No database to operate. Both tiers are always present, with no backend knob (ADR 0016). |
+| **Single-file embedded storage** | One `store.sqlite` file holds everything — symbols, edges, embeddings, BM25 (FTS5) + HNSW traversal, and the temporal views (cochanges, summaries) — via Node's built-in `node:sqlite`. No daemon, no database to operate, and **zero native storage bindings** (ADR 0019 removed both `@ladybugdb/core` and `@duckdb/node-api`). |
 | **15 languages at GA** | TypeScript, JavaScript, Python, Go, Rust, Java, C#, C, C++, Ruby, Kotlin, Swift, PHP, Dart, COBOL — tree-sitter for the first 14 plus a regex provider for fixed-format COBOL. |
-| **WASM-only parse runtime** | `web-tree-sitter` WASM is the only parse runtime. The 15 grammar `.wasm` blobs are vendored at `packages/ingestion/vendor/wasms/`, so parsing does **zero grammar/native builds and zero GitHub fetches** at install time — there is no native parser opt-in. Storage and embeddings still load prebuilt native bindings (see Platform support). |
+| **WASM-only parse runtime** | `web-tree-sitter` WASM is the only parse runtime. The 15 grammar `.wasm` blobs are vendored at `packages/ingestion/vendor/wasms/`, so parsing does **zero grammar/native builds and zero GitHub fetches** at install time — there is no native parser opt-in. Storage is pure `node:sqlite`; the only optional native dep is the local embedder (see Platform support). |
 
 ## Platform support
 
-Parsing is WASM and runs anywhere Node does. The storage and embedding
-tiers, however, depend on **prebuilt native bindings** — `@ladybugdb/core`
-(graph store), `@duckdb/node-api` (temporal store), and `onnxruntime-node`
-(local embeddings) — so OpenCodeHub runs on the platforms those bindings
-ship a prebuild for:
+Parsing is WASM and storage is pure `node:sqlite`, so the core runs anywhere
+Node ≥ 24.15 does — no prebuilt native storage bindings, no Docker, no
+postinstall compile (ADR 0019). There is exactly **one** optional native
+dependency: `onnxruntime-web`, the WASM ONNX runtime that powers
+`--embeddings`. It ships prebuilt WebAssembly (no node-gyp, no native
+binding) and runs single-threaded under Node, so it too is platform-agnostic;
+a BM25-only install never loads it.
 
 | Platform | Supported |
 |---|---|
-| `darwin-arm64`, `darwin-x64` | ✅ prebuilt |
-| `linux-x64`, `linux-arm64` (glibc) | ✅ prebuilt |
-| `win32-x64` | ✅ prebuilt |
-| `win32-arm64` | ❌ no prebuild — `codehub analyze` fails at store open |
-| Alpine / musl, 32-bit Linux ARM | ❌ no prebuild — needs a source build of `@ladybugdb/core` |
-
-On an unsupported platform the lbug binding fails to load and `open()`
-throws `GraphDbBindingError` (there is no DuckDB-graph fallback — see
-[ADR 0016](./docs/adr/0016-duckdb-graph-rip.md)). The five-target prebuilt
-matrix mirrors `@ladybugdb/core`'s release artifacts; track its upstream
-for musl / `win32-arm64` coverage.
+| `darwin-arm64`, `darwin-x64` | ✅ |
+| `linux-x64`, `linux-arm64` (glibc **and** musl/Alpine) | ✅ |
+| `win32-x64`, `win32-arm64` | ✅ |
+| anywhere else Node ≥ 24.15 runs | ✅ |
+
+Because storage no longer depends on a platform-specific prebuild, the
+earlier `GraphDbBindingError` / unsupported-platform failure mode is gone —
+see [ADR 0019](./docs/adr/0019-single-file-sqlite-storage.md) (which
+superseded the native-binding storage of [ADR 0016](./docs/adr/0016-duckdb-graph-rip.md)).
 
 ## Quick start
 
@@ -187,7 +187,7 @@ The monorepo is organised as 18 workspace packages under `packages/`:
 | `scanners` | Subprocess wrappers for 19 scanners — OSV, Semgrep, hadolint, tflint, betterleaks, and the rest |
 | `scip-ingest` | SCIP indexer runners (TS, Python, Go, Rust, Java) — emits CALLS, REFERENCES, IMPLEMENTS, TYPE_OF |
 | `search` | Hybrid BM25 + HNSW (ACORN-1 + RaBitQ) query layer |
-| `storage` | `IGraphStore` (`@ladybugdb/core`) + `ITemporalStore` (DuckDB) adapters; deterministic `graphHash` |
+| `storage` | One `SqliteStore` (`node:sqlite`) implementing both `IGraphStore` + `ITemporalStore` over a single `store.sqlite`; deterministic `graphHash` |
 | `summarizer` | Process + cluster summaries for MCP responses |
 | `wiki` | LLM-narrated module pages emitted by `codehub wiki --llm` |
 
@@ -199,12 +199,13 @@ production package set ships free of test-time dependencies.
 ## Embedding backends
 
 OpenCodeHub ships with three embedding backends — all serve the same
-`gte-modernbert-base` 768-dim space, all use CLS pooling + L2 norm — and
-picks one at runtime based on environment variables:
+`codefuse-ai/F2LLM-v2-80M` 320-dim space (last-token pooling + L2 norm
+baked into the ONNX graph) — and picks one at runtime based on
+environment variables:
 
 | Precedence | Env | Backend |
 |---|---|---|
-| 1 | `CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT` | **SageMaker** — invokes an AWS SageMaker Runtime endpoint (e.g. a TEI-served `gte-modernbert-embed`). Auth via the default AWS credential chain (profile, env vars, IMDS). No local weights needed. |
+| 1 | `CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT` | **SageMaker** — invokes an AWS SageMaker Runtime endpoint (e.g. a TEI-served `F2LLM-v2-80M`). Auth via the default AWS credential chain (profile, env vars, IMDS). No local weights needed. |
 | 2 | `CODEHUB_EMBEDDING_URL` + `CODEHUB_EMBEDDING_MODEL` | **HTTP (OpenAI-compatible)** — POSTs to a `/v1/embeddings` server (Infinity, vLLM, TEI, Ollama, LM Studio, OpenAI). Bearer auth optional via `CODEHUB_EMBEDDING_API_KEY`. |
 | 3 | *(nothing set)* | **Local ONNX** — deterministic, offline-safe. Requires `codehub setup --embeddings` to download the weights. |
 
@@ -212,13 +213,13 @@ picks one at runtime based on environment variables:
 
 | Var | Default | Purpose |
 |---|---|---|
-| `CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT` | *(required to select)* | Endpoint name (e.g. `gte-modernbert-embed`). |
+| `CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT` | *(required to select)* | Endpoint name (e.g. `F2LLM-v2-80M`). |
 | `CODEHUB_EMBEDDING_SAGEMAKER_REGION` | `us-east-1` | AWS region. |
-| `CODEHUB_EMBEDDING_DIMS` | `768` | Expected vector dimension — asserted on every response to catch model-swap drift. |
-| `CODEHUB_EMBEDDING_MODEL` | `gte-modernbert-base/sagemaker:<endpoint-name>` | Stable modelId stamp recorded in index metadata. Override only when bridging a non-gte endpoint. |
+| `CODEHUB_EMBEDDING_DIMS` | `320` | Expected vector dimension — asserted on every response to catch model-swap drift. |
+| `CODEHUB_EMBEDDING_MODEL` | `F2LLM-v2-80M/sagemaker:<endpoint-name>` | Stable modelId stamp recorded in index metadata. Override only when bridging a non-F2LLM endpoint. |
 
 IAM: the caller needs `sagemaker:InvokeEndpoint` on the endpoint ARN —
-e.g. `arn:aws:sagemaker:us-east-1:<account>:endpoint/gte-modernbert-embed`.
+e.g. `arn:aws:sagemaker:us-east-1:<account>:endpoint/F2LLM-v2-80M`.
 
 **Do not mix backends against the same index.** Backends are pinned to a
 single model identity via the `modelId` stamp in the `embeddings` table;
@@ -226,27 +227,29 @@ switching mid-project requires `codehub analyze --rebuild-embeddings`.
 `--offline` refuses SageMaker and HTTP backends, so offline mode is
 compatible only with the local ONNX path.
 
-## Storage backend — lbug graph + DuckDB temporal
-
-The graph tier is always `@ladybugdb/core` (`<repo>/.codehub/graph.lbug`);
-the temporal tier — cochanges, structured symbol summaries, and the
-`codehub query --sql` escape hatch — is always DuckDB
-(`<repo>/.codehub/temporal.duckdb`). Both files are written on every
-`analyze`. There is no `CODEHUB_STORE` env var, no backend probe, no
-single-file `graph.duckdb` layout, and no mtime arbitration; if the lbug
-binding fails to load, `open()` throws `GraphDbBindingError` and the
-operation aborts.
-
-`IGraphStore` lives only on `GraphDbStore`; `DuckDbStore` implements
-`ITemporalStore` only. The segregated interfaces stay because they are
-the v1.0 contract for community-fork adapters (AGE / Memgraph / Neo4j /
-Neptune target `IGraphStore`; DuckDB owns `ITemporalStore`). Embeddings
-live in `graph.lbug` and stream into a per-call DuckDB temp table at
-pack time so the byte-identical Parquet sidecar still works.
-
-See [`docs/adr/0016-duckdb-graph-rip.md`](./docs/adr/0016-duckdb-graph-rip.md)
-for the rationale behind ripping out the DuckDB graph backend; it
-supersedes ADR 0013 and the DuckDB-as-graph passages of ADR 0011.
+## Storage backend — single-file SQLite
+
+The entire index lives in ONE `<repo>/.codehub/store.sqlite` file (WAL),
+via Node's built-in `node:sqlite` — graph nodes, edges, embeddings, the
+FTS5 BM25 table, and the temporal tables (cochanges, symbol summaries, the
+`codehub query --sql` escape hatch). One `SqliteStore` class implements
+**both** `IGraphStore` and `ITemporalStore`; `openStore()` returns that
+single instance as both the `graph` and `temporal` views, so call sites use
+`store.graph.X()` / `store.temporal.Y()` unchanged. **Zero native storage
+bindings** — `@ladybugdb/core` and `@duckdb/node-api` are both gone, so
+there is no `GraphDbBindingError`, no backend probe, and no platform-prebuild
+matrix.
+
+The segregated `IGraphStore` / `ITemporalStore` interfaces stay as the
+community-fork escape hatch (AGE / Memgraph / Neo4j / Neptune) — a fork
+implements both, on one class or split. Install is zero-native-dep:
+`npm i -g @opencodehub/cli` + Node ≥ 24.15, no Docker, no postinstall
+compile. (`onnxruntime-web`, the optional WASM embedder, is the only native
+dependency — lazy-loaded under `--embeddings`.)
+
+See [`docs/adr/0019-single-file-sqlite-storage.md`](./docs/adr/0019-single-file-sqlite-storage.md)
+for the rationale; it supersedes [ADR 0016](./docs/adr/0016-duckdb-graph-rip.md)
+(and, transitively, the native-binding storage of ADRs 0011 / 0013 / 0001).
 
 ## Parse runtime — WASM-only, vendored grammars
 
@@ -254,8 +257,9 @@ supersedes ADR 0013 and the DuckDB-as-graph passages of ADR 0011.
 runtime on the supported Node range (22 and 24). There is no native opt-in:
 the native `tree-sitter` N-API addon and all 14 `tree-sitter-<lang>` npm
 packages are gone from the install graph, so parsing pulls **zero native
-builds and zero GitHub fetches** at install time. (Storage and embeddings
-load prebuilt native bindings — see Platform support.)
+builds and zero GitHub fetches** at install time. (Storage is pure
+`node:sqlite`; the only optional native dep is the WASM embedder — see
+Platform support.)
 
 All 15 grammar `.wasm` blobs are vendored at
 `packages/ingestion/vendor/wasms/`, built from the grammar sources
diff --git a/SPECS.md b/SPECS.md
index d26ccf7e..bbba8926 100644
--- a/SPECS.md
+++ b/SPECS.md
@@ -17,7 +17,7 @@ first 14 plus a regex provider for fixed-format COBOL, runs SCIP indexers
 for TypeScript/JavaScript, Python, Go, Rust, and Java to upgrade tree-sitter
 heuristic edges to compiler-grade edges, clusters the graph into
 Communities and Processes, and optionally populates embeddings from a
-pinned gte-modernbert-base ONNX model (fp32 ~596 MB or int8 ~150 MB) or
+pinned F2LLM-v2-80M ONNX model (320-dim; fp32 ~321 MB or int8 ~81 MB) or
 an OpenAI-compatible HTTP endpoint.
 
 At query time it exposes an MCP server with 28 tools (`query`, `context`,
@@ -171,7 +171,7 @@ last-analyzed commit) atomically and expose it via `getMeta`.
 BM25 + ANN search, fuse results with reciprocal rank fusion (`DEFAULT_RRF_K`),
 and return symbols grouped by their participating `Process`.
 
-4.2 Where gte-modernbert-base weights are absent and no HTTP embedder is
+4.2 Where F2LLM-v2-80M weights are absent and no HTTP embedder is
 configured, the system shall fall back to BM25-only search and log a
 one-shot `[mcp] hybrid:` warning to stderr.
 
@@ -264,8 +264,9 @@ and `sql`.
 claude-code, cursor, codex, windsurf, and opencode; pass `--undo` to
 restore the most recent `.bak`.
 
-7.4 The `setup --embeddings` command shall download gte-modernbert-base
-weights (fp32 or int8) with SHA256 pins validated against
+7.4 The `setup --embeddings` command shall download the F2LLM-v2-80M
+ONNX export (fp32 or int8) — a custom-exported artifact hosted as a
+GitHub release asset — with SHA256 pins validated against
 `model-pins.ts`.
 
 7.5 The `setup --plugin` command shall copy the bundled plugin into
diff --git a/docs/adr/0001-storage-backend.md b/docs/adr/0001-storage-backend.md
index b456471d..64f10131 100644
--- a/docs/adr/0001-storage-backend.md
+++ b/docs/adr/0001-storage-backend.md
@@ -1,6 +1,17 @@
 # ADR 0001 — Storage backend selection
 
-Status: **Accepted (superseded prior SQLite recommendation)** — 2026-04-18
+Status: **Superseded** — current storage is [ADR 0019 — Single-file SQLite
+storage](./0019-single-file-sqlite-storage.md) (2026-06-22). This ADR
+selected **DuckDB** as the embedded backend; that decision was unwound over
+[ADR 0011](./0011-graph-db-backend.md) → [ADR 0013-m7](./0013-m7-default-flip-and-abstraction.md)
+→ [ADR 0016](./0016-duckdb-graph-rip.md) → ADR 0019, which lands on one
+`store.sqlite` file (Node built-in `node:sqlite`, **zero** native storage
+bindings — DuckDB included). Ironically ADR 0019 returns to the SQLite
+recommendation this ADR originally rejected. Read this ADR for the original
+license/determinism/binding-availability criteria only; the chosen engine is
+obsolete.
+
+> Originally: **Accepted (superseded prior SQLite recommendation)** — 2026-04-18
 
 ## Context
 
diff --git a/docs/adr/0011-graph-db-backend.md b/docs/adr/0011-graph-db-backend.md
index 4d48ade9..e3365085 100644
--- a/docs/adr/0011-graph-db-backend.md
+++ b/docs/adr/0011-graph-db-backend.md
@@ -1,10 +1,13 @@
 # ADR 0011 — Graph-DB backend (LadybugDB phase-1)
 
-- Status: **Partially superseded** by [ADR 0016](./0016-duckdb-graph-rip.md)
-  on 2026-05-16. The "DuckDB-default plus LadybugDB opt-in" framing is
-  obsolete; lbug is the unconditional graph backend after the rip. The
-  LadybugDB integration shape and `IGraphStore` design introduced here
-  are unchanged.
+- Status: **Superseded** — current storage is [ADR 0019 — Single-file
+  SQLite storage](./0019-single-file-sqlite-storage.md) (2026-06-22).
+  Chain: this ADR (LadybugDB phase-1) → [ADR 0016](./0016-duckdb-graph-rip.md)
+  (lbug-only graph, made the "DuckDB-default + LadybugDB opt-in" framing
+  obsolete, 2026-05-16) → ADR 0019 (one `store.sqlite`, NO native bindings —
+  `@ladybugdb/core` itself is now gone). The `IGraphStore` design introduced
+  here survives ADR 0019 as a community-fork escape hatch; the LadybugDB
+  binding does not. Read this ADR for historical rationale only.
 - Was: **Accepted** on 2026-05-05 and flipped on the M3 merge.
 - Authors: Laith Al-Saadoon + Claude.
 - Branch: `feat/v1-m3-m4`.
diff --git a/docs/adr/0013-m7-default-flip-and-abstraction.md b/docs/adr/0013-m7-default-flip-and-abstraction.md
index 76230e2e..44825f99 100644
--- a/docs/adr/0013-m7-default-flip-and-abstraction.md
+++ b/docs/adr/0013-m7-default-flip-and-abstraction.md
@@ -5,12 +5,14 @@
 > in-tree because they were authored in parallel branches and accepted
 > on the same release. The next ADR uses 0014.
 
-- Status: **Superseded** by [ADR 0016](./0016-duckdb-graph-rip.md)
-  on 2026-05-16. The auto-probe, dual-artifact arbitration, and
-  `CODEHUB_STORE` resolver introduced here are gone. lbug is the only
-  graph backend; DuckDB serves the temporal tier. The
-  IGraphStore/ITemporalStore segregation survives because community
-  adapters (AGE, Memgraph, Neo4j, Neptune) target it.
+- Status: **Superseded** — current storage is [ADR 0019 — Single-file
+  SQLite storage](./0019-single-file-sqlite-storage.md) (2026-06-22).
+  Chain: this ADR → [ADR 0016](./0016-duckdb-graph-rip.md) (2026-05-16,
+  removed the auto-probe / dual-artifact arbitration / `CODEHUB_STORE`
+  resolver introduced here) → ADR 0019 (one `store.sqlite`, no native
+  bindings). The IGraphStore/ITemporalStore segregation introduced here
+  survives all the way to ADR 0019 as the community-fork escape hatch
+  (AGE, Memgraph, Neo4j, Neptune); everything else here is historical.
 - Was: **Accepted** on 2026-05-09 and flipped on the
   `feat/v1-finalize-track-a` merge (PR #71).
 - Authors: Laith Al-Saadoon + Claude.
diff --git a/docs/adr/0014-scip-references-and-embedder-fingerprint.md b/docs/adr/0014-scip-references-and-embedder-fingerprint.md
index 869e3e3a..df601ff9 100644
--- a/docs/adr/0014-scip-references-and-embedder-fingerprint.md
+++ b/docs/adr/0014-scip-references-and-embedder-fingerprint.md
@@ -1,10 +1,21 @@
 # ADR 0014 — SCIP REFERENCES + TYPE_OF emission and embedder-fingerprint refusal
 
-**Status**: Accepted
+**Status**: Accepted (still in force)
 **Date**: 2026-05-09
 **Supersedes**: none
 **Superseded by**: none
 
+> Note (2026-06-26): the embedder-fingerprint mechanism this ADR introduced
+> — persist `embedder_model_id`, refuse mismatched queries via
+> `assertEmbedderCompatible` — is unchanged and is precisely what guards the
+> later embedding-model swap from `gte-modernbert-base` (768-dim) to
+> `F2LLM-v2-80M` (320-dim). The `gte-modernbert-base` / `768` references
+> below are the contemporaneous examples; the dim/model are now 320 /
+> `f2llm-v2-80m/*` but the decision and the comparator are identical. The
+> `store_meta` storage substrate referenced here (DuckDB) was later replaced
+> per [ADR 0019](./0019-single-file-sqlite-storage.md); the column and
+> semantics carried over to `store.sqlite` verbatim.
+
 ## Context
 
 Two unrelated holes in v1.0 finalize, both routing through a shared one-time graphHash content delta. They land in a single ADR per spec.md§Q7 because the fixture-regeneration cost is paid once.
diff --git a/docs/adr/0016-duckdb-graph-rip.md b/docs/adr/0016-duckdb-graph-rip.md
index f766be6d..5fb3ac21 100644
--- a/docs/adr/0016-duckdb-graph-rip.md
+++ b/docs/adr/0016-duckdb-graph-rip.md
@@ -1,6 +1,16 @@
 # ADR 0016 — Rip out the DuckDB graph backend; lbug-only graph, DuckDB temporal-only
 
-- Status: **Accepted** — 2026-05-16.
+- Status: **Superseded** by [ADR 0019 — Single-file SQLite storage](./0019-single-file-sqlite-storage.md)
+  on 2026-06-22, **in its entirety**. ADR 0019 removed BOTH native bindings
+  this ADR settled on (`@ladybugdb/core` for the graph tier and
+  `@duckdb/node-api` for the temporal tier) and replaced the pair with one
+  `store.sqlite` file via Node's built-in `node:sqlite`. The segregated
+  `IGraphStore` / `ITemporalStore` interfaces this ADR preserved for
+  community forks survive — both are now implemented by a single
+  `SqliteStore` class. Read this ADR only for the historical rationale of
+  the lbug-graph / DuckDB-temporal split; **do not** treat its decision as
+  current.
+- Was: **Accepted** — 2026-05-16.
 - Authors: Laith Al-Saadoon + Claude.
 - Branch: `feat/duckdb-graph-rip`.
 - Supersedes: [ADR 0013 — M7 default flip and storage abstraction](./0013-m7-default-flip-and-abstraction.md)
diff --git a/lefthook.yml b/lefthook.yml
index 3a12f81f..44e15322 100644
--- a/lefthook.yml
+++ b/lefthook.yml
@@ -73,6 +73,7 @@ pre-push:
     # Guard the verdict gate on a present index so the hook degrades
     # gracefully on dev boxes that haven't run `codehub analyze` yet —
     # mirrors the SKIP behaviour of scripts/pack-determinism-audit.sh.
+    # Index path is the single-file `store.sqlite` (ADR 0019).
     #
     # The verdict CLI exit ladder is 0=auto_merge, 1=single_review,
     # 2=dual_review/expert_review, 3=block. Those tiers are review-routing
@@ -82,8 +83,8 @@ pre-push:
     # surface the verdict output and gate solely on exit code 3.
     - name: verdict
       run: |
-        if [ ! -f .codehub/graph.lbug ]; then
-          echo "verdict skipped: no .codehub/graph.lbug (run 'mise run och:self-analyze' first)"
+        if [ ! -f .codehub/store.sqlite ]; then
+          echo "verdict skipped: no .codehub/store.sqlite (run 'mise run och:self-analyze' first)"
           exit 0
         fi
         set +e
diff --git a/packages/cli/src/commands/analyze.test.ts b/packages/cli/src/commands/analyze.test.ts
index f86aa807..98226567 100644
--- a/packages/cli/src/commands/analyze.test.ts
+++ b/packages/cli/src/commands/analyze.test.ts
@@ -409,11 +409,11 @@ test("buildStoreMeta: stamps embedderModelId when the embedder ran with a model
     edgeCount: 200,
     stats: {},
     cacheSizeBytes: 0,
-    embeddings: { ranEmbedder: true, embeddingsModelId: "gte-modernbert-base/fp32" },
+    embeddings: { ranEmbedder: true, embeddingsModelId: "f2llm-v2-80m/fp32" },
   });
   assert.equal(
     meta.embedderModelId,
-    "gte-modernbert-base/fp32",
+    "f2llm-v2-80m/fp32",
     "the embedder tag must round-trip into StoreMeta so the fingerprint guard can fire",
   );
 });
diff --git a/packages/cli/src/commands/analyze.ts b/packages/cli/src/commands/analyze.ts
index 3e0cd1c9..3d1ec906 100644
--- a/packages/cli/src/commands/analyze.ts
+++ b/packages/cli/src/commands/analyze.ts
@@ -31,6 +31,7 @@ import {
   type RelationType,
   SCHEMA_VERSION,
 } from "@opencodehub/core-types";
+import { embedderModelId } from "@opencodehub/embedder";
 import { pipeline } from "@opencodehub/ingestion";
 import {
   type BulkLoadProgressEvent,
@@ -260,9 +261,20 @@ export async function runAnalyze(path: string, opts: AnalyzeOptions = {}): Promi
   // re-embeds everything, so the adapter would do no useful work. When the
   // prior DB is absent the adapter returns undefined and the phase
   // degrades to "every chunk is new".
+  //
+  // Migration safety: the content-hash skip keys on TEXT only, so swapping
+  // the embedder (e.g. gte-modernbert-base/768-dim → f2llm-v2-80m/320-dim)
+  // would otherwise skip every unchanged node and leave stale-dimension
+  // vectors mixed with the new ones. Gate the cache on a model-id match —
+  // when the prior store's `embedderModelId` differs from the active
+  // embedder, the adapter is suppressed (full re-embed; INSERT OR REPLACE
+  // overwrites every row at the new dim).
+  const activeEmbedderModelId = embedderModelId(
+    opts.embeddingsVariant === "int8" ? "int8" : "fp32",
+  );
   const embeddingHashAdapter =
     opts.embeddings === true && opts.force !== true
-      ? await openEmbeddingHashCacheAdapter(repoPath)
+      ? await openEmbeddingHashCacheAdapter(repoPath, activeEmbedderModelId)
       : undefined;
 
   // Resolve `--max-summaries auto` against the prior run's callable count,
@@ -936,6 +948,7 @@ async function openSummaryCacheAdapter(
  */
 async function openEmbeddingHashCacheAdapter(
   repoPath: string,
+  activeModelId: string,
 ): Promise<
   { adapter: pipeline.EmbeddingHashCacheAdapter; close: () => Promise<void> } | undefined
 > {
@@ -948,6 +961,25 @@ async function openEmbeddingHashCacheAdapter(
     await store.close().catch(() => {});
     return undefined;
   }
+  // Migration guard: if the prior index was built by a different embedder,
+  // its content_hashes describe vectors of the wrong model/dimension.
+  // Suppress the cache so every node is re-embedded (full overwrite) rather
+  // than skipped — preventing a silent mixed-dimension store.
+  try {
+    const meta = await store.graph.getMeta();
+    const priorModelId = meta?.embedderModelId;
+    if (priorModelId !== undefined && priorModelId !== activeModelId) {
+      log(
+        `codehub analyze: embedder changed (${priorModelId} → ${activeModelId}); ` +
+          "re-embedding all symbols (content-hash cache suppressed).",
+      );
+      await store.close().catch(() => {});
+      return undefined;
+    }
+  } catch {
+    // Meta unreadable (fresh/legacy store) — fall through; the cache list()
+    // below already tolerates an empty/erroring store.
+  }
   return {
     adapter: {
       // listEmbeddingHashes is on the graph-tier interface — embeddings
diff --git a/packages/cli/src/commands/doctor.test.ts b/packages/cli/src/commands/doctor.test.ts
index 8ad254d8..947c73a5 100644
--- a/packages/cli/src/commands/doctor.test.ts
+++ b/packages/cli/src/commands/doctor.test.ts
@@ -120,7 +120,7 @@ test("embedder weights check reports warn when no model present", async () => {
 test("embedder weights check reports ok when fp32 weights present", async () => {
   const home = await mkdtemp(join(tmpdir(), "codehub-doctor-emb-ok-"));
   try {
-    const base = join(home, ".codehub", "models", "gte-modernbert-base", "fp32");
+    const base = join(home, ".codehub", "models", "f2llm-v2-80m", "fp32");
     await mkdir(base, { recursive: true });
     await writeFile(join(base, "model.onnx"), "fake weights");
     const checks = buildChecks({ home, skipNative: true });
@@ -141,7 +141,7 @@ test("embedder weights check reports ok when fp32 weights present", async () =>
 test("embedder weights check reports ok when int8 weights present (underscore filename)", async () => {
   const home = await mkdtemp(join(tmpdir(), "codehub-doctor-emb-int8-"));
   try {
-    const base = join(home, ".codehub", "models", "gte-modernbert-base", "int8");
+    const base = join(home, ".codehub", "models", "f2llm-v2-80m", "int8");
     await mkdir(base, { recursive: true });
     // Canonical filename from embedder/src/paths.ts:modelFileName("int8").
     await writeFile(join(base, "model_int8.onnx"), "fake int8 weights");
@@ -162,7 +162,7 @@ test("embedder weights check reports ok when int8 weights present (underscore fi
 test("embedder weights check reports warn when only hyphenated int8 file is present", async () => {
   const home = await mkdtemp(join(tmpdir(), "codehub-doctor-emb-hyphen-"));
   try {
-    const base = join(home, ".codehub", "models", "gte-modernbert-base", "int8");
+    const base = join(home, ".codehub", "models", "f2llm-v2-80m", "int8");
     await mkdir(base, { recursive: true });
     await writeFile(join(base, "model-int8.onnx"), "wrong filename");
     const checks = buildChecks({ home, skipNative: true });
diff --git a/packages/cli/src/commands/doctor.ts b/packages/cli/src/commands/doctor.ts
index 8621459d..b36388d8 100644
--- a/packages/cli/src/commands/doctor.ts
+++ b/packages/cli/src/commands/doctor.ts
@@ -583,7 +583,9 @@ function embedderWeightsCheck(home: string): Check {
       // NOT hyphen). A historical hyphenated path name lingered here and
       // caused false-negative `warn`s for users who had int8 weights on
       // disk.
-      const base = join(home, ".codehub", "models", "gte-modernbert-base");
+      // Subdir must match `embedder/src/paths.ts:MODEL_SUBDIR`
+      // (`models/f2llm-v2-80m`); a mismatch silently always-warns.
+      const base = join(home, ".codehub", "models", "f2llm-v2-80m");
       const fp32 = join(base, "fp32", "model.onnx");
       const int8 = join(base, "int8", "model_int8.onnx");
       const fp32Ok = await fileExists(fp32);
diff --git a/packages/cli/src/commands/query.test.ts b/packages/cli/src/commands/query.test.ts
index 7adba79a..14ec185b 100644
--- a/packages/cli/src/commands/query.test.ts
+++ b/packages/cli/src/commands/query.test.ts
@@ -201,6 +201,11 @@ class FakeEmbedder implements Embedder {
   async embed(_text: string): Promise<Float32Array> {
     return new Float32Array([0.1, 0.2, 0.3, 0.4]);
   }
+  // F2LLM gained a query-only `embedQuery` path; the fake aliases it to
+  // `embed` since the hybrid path only needs a stable Float32Array back.
+  async embedQuery(text: string): Promise<Float32Array> {
+    return this.embed(text);
+  }
   async embedBatch(texts: readonly string[]): Promise<readonly Float32Array[]> {
     return texts.map(() => new Float32Array([0.1, 0.2, 0.3, 0.4]));
   }
@@ -457,7 +462,7 @@ test("cli query: embeddings populated + embedder fails → warn + BM25 fallback,
           ...hooksFor(handle, "/tmp/fake"),
           openEmbedder: async () => {
             const err = new Error(
-              "gte-modernbert-base weights not found. Run `codehub setup --embeddings`.",
+              "F2LLM-v2-80M weights not found. Run `codehub setup --embeddings`.",
             );
             (err as unknown as { code: string }).code = "EMBEDDER_NOT_SETUP";
             throw err;
@@ -665,7 +670,7 @@ test("cli query: embedder mismatch sets exit code 2 and still closes embedder +
     ],
     vectorRows: [{ nodeId: "F:foo", distance: 0.1 }],
     // Persisted model id differs from the active embedder's "fake-embedder/test".
-    metaModelId: "gte-modernbert-base/fp32",
+    metaModelId: "f2llm-v2-80m/fp32",
   });
   const fake = new FakeEmbedder();
   const prevExitCode = process.exitCode;
@@ -707,7 +712,7 @@ test("cli query: --force-backend-mismatch bypasses the refusal and runs hybrid",
     ],
     vectorRows: [{ nodeId: "F:foo", distance: 0.1 }],
     nodes,
-    metaModelId: "gte-modernbert-base/fp32",
+    metaModelId: "f2llm-v2-80m/fp32",
   });
   const fake = new FakeEmbedder();
   const prevExitCode = process.exitCode;
diff --git a/packages/cli/src/commands/query.ts b/packages/cli/src/commands/query.ts
index 356a78bc..38f955eb 100644
--- a/packages/cli/src/commands/query.ts
+++ b/packages/cli/src/commands/query.ts
@@ -21,7 +21,7 @@
  *
  * Hybrid ranking priority matches the MCP tool:
  *   1. `CODEHUB_EMBEDDING_URL` + `CODEHUB_EMBEDDING_MODEL` → HTTP embedder.
- *   2. Otherwise local ONNX gte-modernbert-base weights.
+ *   2. Otherwise local ONNX F2LLM-v2-80M weights.
  *   3. On failure to open (missing weights, unreachable HTTP) → warn + BM25.
  */
 
@@ -59,7 +59,7 @@ export interface QueryRuntimeHooks {
   readonly openStore?: (opts: QueryOptions) => Promise<OpenStoreResult>;
   /**
    * Embedder factory — production uses the default lazy-import path; tests
-   * inject a fake so they don't need gte-modernbert-base weights on disk. Any
+   * inject a fake so they don't need F2LLM-v2-80M weights on disk. Any
    * throw is caught by {@link tryOpenEmbedder} and collapses to BM25.
    */
   readonly openEmbedder?: () => Promise<Embedder>;
diff --git a/packages/cli/src/commands/setup-embeddings.test.ts b/packages/cli/src/commands/setup-embeddings.test.ts
index 37232759..4fd34fc1 100644
--- a/packages/cli/src/commands/setup-embeddings.test.ts
+++ b/packages/cli/src/commands/setup-embeddings.test.ts
@@ -2,7 +2,7 @@
  * Happy-path test for `codehub setup --embeddings` wiring.
  *
  * Uses the public `runSetupEmbeddings` entry and a stub fetch + an override
- * pin manifest so we never hit the real HuggingFace CDN.
+ * pin manifest so we never hit the real GitHub release-asset CDN.
  */
 
 import { strict as assert } from "node:assert";
@@ -13,7 +13,7 @@ import { join } from "node:path";
 import { ReadableStream } from "node:stream/web";
 import { describe, it } from "node:test";
 
-import { GTE_MODERNBERT_BASE_PINS } from "@opencodehub/embedder";
+import { F2LLM_V2_80M_PINS } from "@opencodehub/embedder";
 
 import { runSetupEmbeddings } from "./setup.js";
 
@@ -49,7 +49,7 @@ describe("runSetupEmbeddings", { skip: platformSkip }, () => {
       // Build a tiny per-file body keyed by pin name; substitute our SHAs into
       // the manifest so the downloader's verification passes.
       const bodies = new Map<string, Uint8Array>();
-      const originals = GTE_MODERNBERT_BASE_PINS.fp32.files;
+      const originals = F2LLM_V2_80M_PINS.fp32.files;
       const replaced = originals.map((f, idx) => {
         const body = new TextEncoder().encode(`pin-${idx}-${f.name}`);
         bodies.set(f.url, body);
@@ -61,7 +61,7 @@ describe("runSetupEmbeddings", { skip: platformSkip }, () => {
         };
       });
 
-      const mutable = GTE_MODERNBERT_BASE_PINS as unknown as {
+      const mutable = F2LLM_V2_80M_PINS as unknown as {
         fp32: { variant: "fp32"; files: readonly (typeof replaced)[number][] };
       };
       const saved = mutable.fp32;
diff --git a/packages/cli/src/commands/setup.ts b/packages/cli/src/commands/setup.ts
index c8048f17..2f79d507 100644
--- a/packages/cli/src/commands/setup.ts
+++ b/packages/cli/src/commands/setup.ts
@@ -246,9 +246,9 @@ async function writeSingle(
  * allows the CLI `log`/`warn` sinks to be overridden for tests.
  */
 export interface SetupEmbeddingsOptions {
-  /** Variant to install. Defaults to `fp32` (~596 MB). */
+  /** Variant to install. Defaults to `fp32` (~332 MB). */
   readonly variant?: "fp32" | "int8";
-  /** Custom model directory. Defaults to `~/.codehub/models/gte-modernbert-base/<variant>/`. */
+  /** Custom model directory. Defaults to `~/.codehub/models/f2llm-v2-80m/<variant>/`. */
   readonly modelDir?: string;
   /** Re-download even if files already match their SHA256 pin. */
   readonly force?: boolean;
@@ -264,7 +264,8 @@ export interface SetupEmbeddingsOptions {
 /**
  * Public entry point for `codehub setup --embeddings`.
  *
- * Downloads the five pinned gte-modernbert-base files into the target dir with
+ * Downloads the three pinned F2LLM-v2-80M files (ONNX weights +
+ * tokenizer.json + tokenizer_config.json) into the target dir with
  * streaming SHA256 verification and atomic rename. Returns the downloader
  * summary so programmatic callers can assert on byte counts and locations.
  */
@@ -277,7 +278,7 @@ export async function runSetupEmbeddings(
 
   log(
     `codehub setup --embeddings: starting ${variant} download ` +
-      `(${variant === "fp32" ? "~90 MB" : "~23 MB"})`,
+      `(${variant === "fp32" ? "~332 MB" : "~92 MB"})`,
   );
 
   const downloaderOpts: DownloadEmbedderOptions = {
diff --git a/packages/cli/src/embedder-downloader.test.ts b/packages/cli/src/embedder-downloader.test.ts
index 515c9dd7..dede18b2 100644
--- a/packages/cli/src/embedder-downloader.test.ts
+++ b/packages/cli/src/embedder-downloader.test.ts
@@ -17,7 +17,7 @@ import { join } from "node:path";
 import { ReadableStream } from "node:stream/web";
 import { describe, it } from "node:test";
 
-import { GTE_MODERNBERT_BASE_PINS } from "@opencodehub/embedder";
+import { F2LLM_V2_80M_PINS } from "@opencodehub/embedder";
 
 import {
   downloadEmbedderWeights,
@@ -98,7 +98,7 @@ function makeFetchWith(
 }
 
 /**
- * Monkeypatch GTE_MODERNBERT_BASE_PINS[variant] for a single test. Because the
+ * Monkeypatch F2LLM_V2_80M_PINS[variant] for a single test. Because the
  * pins are `readonly`, we rebuild the structure by casting into a mutable
  * shape. The test restores on completion.
  */
@@ -107,8 +107,8 @@ function withOverridePins<T>(
   newFiles: readonly { name: string; url: string; sizeBytes: number; sha256: string }[],
   fn: () => Promise<T>,
 ): Promise<T> {
-  const original = GTE_MODERNBERT_BASE_PINS[variant];
-  const mutable = GTE_MODERNBERT_BASE_PINS as unknown as {
+  const original = F2LLM_V2_80M_PINS[variant];
+  const mutable = F2LLM_V2_80M_PINS as unknown as {
     [k in "fp32" | "int8"]: {
       variant: "fp32" | "int8";
       files: readonly { name: string; url: string; sizeBytes: number; sha256: string }[];
diff --git a/packages/cli/src/embedder-downloader.ts b/packages/cli/src/embedder-downloader.ts
index b3078e83..e29a7953 100644
--- a/packages/cli/src/embedder-downloader.ts
+++ b/packages/cli/src/embedder-downloader.ts
@@ -1,8 +1,8 @@
 /**
- * SHA256-pinned downloader for gte-modernbert-base weights.
+ * SHA256-pinned downloader for F2LLM-v2-80M weights.
  *
  * Resolves the target directory via {@link resolveModelDir}, then for each
- * pinned file in {@link GTE_MODERNBERT_BASE_PINS}:
+ * pinned file in {@link F2LLM_V2_80M_PINS}:
  *   1. Skip when the file already exists and its SHA256 matches the pin.
  *   2. Otherwise stream-download to `<target>.tmp`, hash during write, verify
  *      hash, and atomically rename to the final path.
@@ -12,7 +12,7 @@
  * error — the `.tmp` file is deleted and the error thrown. We never ship
  * weights that don't match the pin.
  *
- * All disk access is streaming; we never buffer a 596 MB file in memory.
+ * All disk access is streaming; we never buffer a 321 MB file in memory.
  */
 
 import { createHash } from "node:crypto";
@@ -24,7 +24,7 @@ import { pipeline as streamPipeline } from "node:stream/promises";
 import type { ReadableStream as NodeReadableStream } from "node:stream/web";
 import { setTimeout as delay } from "node:timers/promises";
 
-import { GTE_MODERNBERT_BASE_PINS, type PinnedFile, resolveModelDir } from "@opencodehub/embedder";
+import { F2LLM_V2_80M_PINS, type PinnedFile, resolveModelDir } from "@opencodehub/embedder";
 
 /** Fetch function signature for dependency injection (tests mock this). */
 export type FetchFn = typeof fetch;
@@ -311,7 +311,7 @@ export async function downloadEmbedderWeights(
   const modelDir = resolveModelDir(opts.modelDir, opts.variant);
   await mkdir(modelDir, { recursive: true });
 
-  const files = GTE_MODERNBERT_BASE_PINS[opts.variant].files;
+  const files = F2LLM_V2_80M_PINS[opts.variant].files;
   let downloaded = 0;
   let skipped = 0;
   let totalBytes = 0;
diff --git a/packages/cli/src/index.ts b/packages/cli/src/index.ts
index cb9ddeb5..a205605b 100644
--- a/packages/cli/src/index.ts
+++ b/packages/cli/src/index.ts
@@ -49,7 +49,7 @@ program
   .description("Index a repository at [path] (default: current directory)")
   .option("--force", "Ignore registry cache and re-run the pipeline")
   .option("--embeddings", "Embed symbols and populate the embeddings table in store.sqlite")
-  .option("--embeddings-int8", "Use the int8 embedder variant (~23 MB) instead of fp32")
+  .option("--embeddings-int8", "Use the int8 embedder variant (~81 MB) instead of fp32 (~321 MB)")
   .option(
     "--granularity <csv>",
     "Hierarchical embedding tiers to emit, comma-separated. Values: symbol, file, community. Default: symbol. Example: --granularity symbol,file,community",
@@ -241,8 +241,8 @@ program
   )
   .option("--force", "Overwrite an existing codehub entry without prompting; re-download weights")
   .option("--undo", "Restore the most recent .bak next to each config")
-  .option("--embeddings", "Download gte-modernbert-base ONNX weights (SHA256-pinned)")
-  .option("--int8", "Use the int8 weight variant (~150 MB) instead of fp32 (~596 MB)")
+  .option("--embeddings", "Download F2LLM-v2-80M ONNX weights (SHA256-pinned)")
+  .option("--int8", "Use the int8 weight variant (~92 MB) instead of fp32 (~332 MB)")
   .option("--model-dir <path>", "Override the target directory for embedder weights")
   .option("--plugin", "Install the Claude Code plugin to ~/.claude/plugins/opencodehub/")
   .option(
diff --git a/packages/docs/src/content/docs/architecture/embeddings.md b/packages/docs/src/content/docs/architecture/embeddings.md
index e3cf8111..5ec0841e 100644
--- a/packages/docs/src/content/docs/architecture/embeddings.md
+++ b/packages/docs/src/content/docs/architecture/embeddings.md
@@ -56,11 +56,21 @@ flowchart LR
 
 ### ONNX local
 
-The default. Deterministic 768-dim embeddings from
-`Alibaba-NLP/gte-modernbert-base`. Weights live in the directory
-managed by `@opencodehub/embedder/paths`; missing weights throw
+The default. Deterministic 320-dim embeddings from
+`codefuse-ai/F2LLM-v2-80M` (a Qwen3-0.6B-Base derivative, 80.1M params).
+Last-token pooling and L2 normalization are baked into the ONNX graph,
+which emits a single already-unit-length output `embedding` of shape
+`[B, 320]`. The custom ONNX export is hosted as a SHA256-pinned GitHub
+release asset; weights live in the directory managed by
+`@opencodehub/embedder/paths`; missing weights throw
 `EmbedderNotSetupError`, which `codehub setup --embeddings` fixes.
 
+**Query/document asymmetry.** Documents are embedded raw. Queries get an
+`Instruct: {instruction}\nQuery: {query}` prefix (instruction: "Given a
+code search query, retrieve the most relevant code snippet.") via the
+`embedQuery()` method on the Embedder interface, applied only at the
+hybrid-search query seam.
+
 A Piscina worker pool (`embedder-pool.ts`) spins up when
 `embeddingsWorkers >= 2`, running ONNX inference across worker
 threads. Single-worker mode is the default and is good enough for
@@ -73,7 +83,7 @@ wire format:
 
 - `CODEHUB_EMBEDDING_URL` — base URL (`/embeddings` is appended).
 - `CODEHUB_EMBEDDING_MODEL` — model id passed through verbatim.
-- `CODEHUB_EMBEDDING_DIMS` — dimensions (default 768).
+- `CODEHUB_EMBEDDING_DIMS` — dimensions (default 320).
 - `CODEHUB_EMBEDDING_API_KEY` — bearer token.
 
 30 s timeout, 2 retries with 1 s backoff.
@@ -89,8 +99,8 @@ carries on.
 
 ModelId stamping is explicit to prevent silent cross-backend
 pollution of the `embeddings.model` column: SageMaker rows carry
-`gte-modernbert-base/sagemaker:<endpointName>`, ONNX rows carry
-`gte-modernbert-base/fp32`, HTTP rows pass the configured model id
+`F2LLM-v2-80M/sagemaker:<endpointName>`, ONNX rows carry
+`F2LLM-v2-80M/fp32`, HTTP rows pass the configured model id
 through. See the durable lesson linked below for the full pattern
 (dynamic import, structural-typing seam, 413 split-retry).
 
@@ -156,7 +166,7 @@ enabling `hnsw_acorn` enables it.
   `["symbol"]`).
 - `PipelineOptions.embeddingsWorkers` — Piscina pool size for ONNX.
 - `PipelineOptions.embeddingsBatchSize` — default 32.
-- `DuckDbStoreOptions.embeddingDim` — default 768.
+- `SqliteStoreOptions.embeddingDim` — default 320.
 - Env vars: `CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT` / `_REGION` /
   `_MODEL` / `_DIMS`; `CODEHUB_EMBEDDING_URL` / `_MODEL` / `_DIMS` /
   `_API_KEY`.
diff --git a/packages/docs/src/content/docs/architecture/monorepo-map.md b/packages/docs/src/content/docs/architecture/monorepo-map.md
index d41330c9..1bce8010 100644
--- a/packages/docs/src/content/docs/architecture/monorepo-map.md
+++ b/packages/docs/src/content/docs/architecture/monorepo-map.md
@@ -19,7 +19,7 @@ package is a library imported by `cli`, `mcp`, `ingestion`, or
 | `@opencodehub/cli` | `packages/cli` | The `codehub` binary (analyze, setup, mcp, query, context, impact, sql, group, scan, verdict, code-pack, ...). |
 | `@opencodehub/cobol-proleap` | `packages/cobol-proleap` | Optional JVM ProLeap deep-parse bridge for COBOL — gated behind `--allow-build-scripts=proleap`. |
 | `@opencodehub/core-types` | `packages/core-types` | Shared graph schema, `LanguageId`, `RelationType`, determinism primitives. |
-| `@opencodehub/embedder` | `packages/embedder` | Deterministic ONNX embedder (`gte-modernbert-base`), modelId fingerprint, three-backend cascade. |
+| `@opencodehub/embedder` | `packages/embedder` | Deterministic ONNX embedder (`F2LLM-v2-80M`, 320-dim), modelId fingerprint, three-backend cascade. |
 | `@opencodehub/frameworks` | `packages/frameworks` | Five-stage framework detector (manifest → lockfile → config-AST → folder → import/SCIP) over a curated registry. |
 | `@opencodehub/ingestion` | `packages/ingestion` | The indexing pipeline (parse, resolve, scip-index, embeddings, communities, processes, summaries, ...). |
 | `@opencodehub/mcp` | `packages/mcp` | The stdio MCP server, 28 tool registrations (all read-only with respect to user source), 7 resources, the error envelope, the staleness `_meta` block. |
diff --git a/packages/docs/src/content/docs/reference/cli.md b/packages/docs/src/content/docs/reference/cli.md
index d89928de..0d267671 100644
--- a/packages/docs/src/content/docs/reference/cli.md
+++ b/packages/docs/src/content/docs/reference/cli.md
@@ -88,8 +88,8 @@ codehub setup
 | `--editors <list>` | all | `claude-code,cursor,codex,windsurf,opencode`. |
 | `--force` | off | Overwrite existing entries; re-download weights. |
 | `--undo` | off | Restore the most recent `.bak` next to each config. |
-| `--embeddings` | off | Download `gte-modernbert-base` ONNX weights (SHA256-pinned). |
-| `--int8` | off | Use the int8 weight variant (~150 MB) instead of fp32 (~596 MB). |
+| `--embeddings` | off | Download `F2LLM-v2-80M` ONNX weights (SHA256-pinned GitHub release asset). |
+| `--int8` | off | Use the int8 weight variant (~81 MB) instead of fp32 (~321 MB). |
 | `--model-dir <path>` | — | Override the target directory for embedder weights. |
 | `--plugin` | off | Install the Claude Code plugin to `~/.claude/plugins/opencodehub/`. |
 
diff --git a/packages/docs/src/content/docs/reference/configuration.md b/packages/docs/src/content/docs/reference/configuration.md
index 3537fcea..50fe8e82 100644
--- a/packages/docs/src/content/docs/reference/configuration.md
+++ b/packages/docs/src/content/docs/reference/configuration.md
@@ -51,11 +51,11 @@ that resolves wins; the others are ignored.
 | `CODEHUB_EMBEDDING_SAGEMAKER_REGION` | Override the AWS region for the SageMaker call. |
 | `CODEHUB_EMBEDDING_URL` | Base URL for an OpenAI-compatible HTTP endpoint (Infinity, vLLM, TEI, Ollama, LM Studio, OpenAI). `/embeddings` is appended. |
 | `CODEHUB_EMBEDDING_MODEL` | Model id passed through to the HTTP endpoint verbatim. |
-| `CODEHUB_EMBEDDING_DIMS` | Dimensionality of the embedding model. Default 768. |
+| `CODEHUB_EMBEDDING_DIMS` | Dimensionality of the embedding model. Default 320. |
 | `CODEHUB_EMBEDDING_API_KEY` | Bearer token sent as `Authorization: Bearer ...`. |
 
 When none of the above are set, the local ONNX backend
-(`gte-modernbert-base`, deterministic, offline-safe) is used.
+(`F2LLM-v2-80M`, 320-dim, deterministic, offline-safe) is used.
 
 ### Other toggles
 
diff --git a/packages/embedder/CHANGELOG.md b/packages/embedder/CHANGELOG.md
index e4c5c80a..32859387 100644
--- a/packages/embedder/CHANGELOG.md
+++ b/packages/embedder/CHANGELOG.md
@@ -1,5 +1,19 @@
 # Changelog
 
+## [0.2.0](https://github.com/theagenticguy/opencodehub/compare/embedder-v0.1.3...embedder-v0.2.0) (2026-06-26)
+
+
+### ⚠ BREAKING CHANGES
+
+* **embedder:** swap the local ONNX model from `gte-modernbert-base` (768-dim) to `codefuse-ai/F2LLM-v2-80M` (320-dim). The dimension change is incompatible with existing stores — re-index with `codehub analyze --embeddings`. The fingerprint guard already refuses queries against a stale store on a `modelId` mismatch.
+
+
+### Features
+
+* **embedder:** replace gte-modernbert-base with `codefuse-ai/F2LLM-v2-80M` (Qwen3-0.6B-Base derivative, 80.1M params, 320-dim). Last-token pooling + L2 normalization are baked into the ONNX graph — the graph emits a single already-unit-length `embedding` output of shape `[B, 320]`.
+* **embedder:** add `embedQuery()` to the Embedder interface for query/document asymmetry — queries get an `Instruct: {instruction}\nQuery: {query}` prefix (instruction: "Given a code search query, retrieve the most relevant code snippet."), documents are embedded raw. Applied only at the hybrid-search query seam.
+* **embedder:** ship the model as a custom ONNX export hosted as a GitHub release asset (`github.com/theagenticguy/opencodehub/releases/download/embed-v1/...`), SHA256-pinned in `model-pins.ts` (`F2LLM_V2_80M_PINS`, renamed from `GTE_MODERNBERT_BASE_PINS`). fp32 ~321 MB / int8 ~81 MB. Tokenizer is Qwen2 BPE (`tokenizer.json` + `tokenizer_config.json`). Runtime unchanged: `onnxruntime-web` (WASM), single-threaded deterministic. License: Apache-2.0.
+
 ## [0.1.3](https://github.com/theagenticguy/opencodehub/compare/embedder-v0.1.2...embedder-v0.1.3) (2026-06-01)
 
 
diff --git a/packages/embedder/README.md b/packages/embedder/README.md
index a335a5a4..696cfc67 100644
--- a/packages/embedder/README.md
+++ b/packages/embedder/README.md
@@ -1,8 +1,8 @@
 # @opencodehub/embedder
 
 Deterministic text embedder for OpenCodeHub. Uses the
-`gte-modernbert-base` model via ONNX Runtime (CPU) locally or
-Amazon SageMaker for larger deployments.
+`codefuse-ai/F2LLM-v2-80M` model (320-dim) via ONNX Runtime (WASM, CPU)
+locally or Amazon SageMaker for larger deployments.
 
 ## Surface
 
@@ -16,8 +16,9 @@ const vectors = await embed(["function foo(): void {}", "class Bar {}"]);
 const vectors = await embed(texts, { backend: EmbedderBackend.SageMaker });
 ```
 
-- **Local backend** — runs `gte-modernbert-base` via `onnxruntime-node`
-  (CPU only; CUDA postinstall is suppressed via `.npmrc`).
+- **Local backend** — runs `F2LLM-v2-80M` via `onnxruntime-web`
+  (WASM, single-threaded, deterministic; no native bindings). Last-token
+  pooling + L2 normalization are baked into the ONNX graph.
 - **SageMaker backend** — sends batches to an endpoint via
   `@aws-sdk/client-sagemaker-runtime`; endpoint URL read from
   `OCH_SAGEMAKER_ENDPOINT`.
@@ -30,7 +31,7 @@ const vectors = await embed(texts, { backend: EmbedderBackend.SageMaker });
 |---|---|---|
 | `OCH_EMBED_BACKEND` | `onnx` | `onnx` or `sagemaker` |
 | `OCH_SAGEMAKER_ENDPOINT` | — | SageMaker real-time endpoint URL |
-| `OCH_EMBED_DIM` | `768` | Expected embedding dimension (validation) |
+| `OCH_EMBED_DIM` | `320` | Expected embedding dimension (validation) |
 
 ## Design
 
@@ -39,5 +40,5 @@ const vectors = await embed(texts, { backend: EmbedderBackend.SageMaker });
   fully offline.
 - The SageMaker path is the recommended backend for CI and cloud
   deployments; the ONNX path is the default for local dev.
-- `onnxruntime_node_install_cuda=skip` in `.npmrc` prevents the ~400 MB
-  CUDA EP postinstall download.
+- `onnxruntime-web` runs the model as WASM with no native postinstall —
+  the local backend ships zero native bindings.
diff --git a/packages/embedder/package.json b/packages/embedder/package.json
index 961872c2..6d633256 100644
--- a/packages/embedder/package.json
+++ b/packages/embedder/package.json
@@ -1,8 +1,8 @@
 {
   "name": "@opencodehub/embedder",
-  "version": "0.1.3",
+  "version": "0.2.0",
   "private": true,
-  "description": "OpenCodeHub — ONNX-based deterministic text embedder (gte-modernbert-base)",
+  "description": "OpenCodeHub — ONNX-based deterministic text embedder (F2LLM-v2-80M)",
   "license": "Apache-2.0",
   "repository": {
     "type": "git",
@@ -63,7 +63,7 @@
     "embeddings",
     "onnx",
     "sagemaker",
-    "gte-modernbert",
+    "f2llm",
     "semantic-search"
   ],
   "engines": {
diff --git a/packages/embedder/src/factory.test.ts b/packages/embedder/src/factory.test.ts
index 6147bb86..d2fd03d1 100644
--- a/packages/embedder/src/factory.test.ts
+++ b/packages/embedder/src/factory.test.ts
@@ -24,10 +24,11 @@ import { type Embedder, EmbedderNotSetupError } from "./types.js";
 /** Build a sentinel Embedder whose identity we can assert against. */
 function makeSentinelEmbedder(modelId: string): Embedder {
   return {
-    dim: 768,
+    dim: 320,
     modelId,
-    embed: async () => new Float32Array(768),
-    embedBatch: async (texts) => texts.map(() => new Float32Array(768)),
+    embed: async () => new Float32Array(320),
+    embedQuery: async () => new Float32Array(320),
+    embedBatch: async (texts) => texts.map(() => new Float32Array(320)),
     close: async () => {},
   };
 }
@@ -49,7 +50,7 @@ describe("openDefaultEmbedder", () => {
   });
 
   it("falls back to ONNX when no HTTP env vars and allowOnnxFallback defaults to true", async () => {
-    const onnxSentinel = makeSentinelEmbedder("gte-modernbert-base/fp32");
+    const onnxSentinel = makeSentinelEmbedder("f2llm-v2-80m/fp32");
     const result = await openDefaultEmbedder(
       {},
       {
@@ -58,7 +59,7 @@ describe("openDefaultEmbedder", () => {
       },
     );
     strictEqual(result, onnxSentinel, "factory should return the ONNX embedder reference");
-    equal(result.modelId, "gte-modernbert-base/fp32");
+    equal(result.modelId, "f2llm-v2-80m/fp32");
   });
 
   it("throws EmbedderNotSetupError when HTTP env vars absent and allowOnnxFallback=false", async () => {
@@ -90,7 +91,7 @@ describe("openDefaultEmbedder", () => {
 
   it("propagates the underlying error when ONNX setup fails", async () => {
     const onnxFailure = new EmbedderNotSetupError(
-      "Run `codehub setup --embeddings` to install gte-modernbert-base",
+      "Run `codehub setup --embeddings` to install f2llm-v2-80m",
     );
     await rejects(
       openDefaultEmbedder(
diff --git a/packages/embedder/src/factory.ts b/packages/embedder/src/factory.ts
index 174f60f8..4bf265f2 100644
--- a/packages/embedder/src/factory.ts
+++ b/packages/embedder/src/factory.ts
@@ -6,7 +6,7 @@
  *   1. {@link tryOpenHttpEmbedder} reads SageMaker / OpenAI-HTTP env vars
  *      first and returns a remote-backed embedder when configured.
  *   2. Otherwise — and only when `allowOnnxFallback === true` (the default) —
- *      fall back to {@link openOnnxEmbedder}, which loads gte-modernbert-base
+ *      fall back to {@link openOnnxEmbedder}, which loads F2LLM-v2-80m
  *      weights from disk (the lazy-load side effect).
  *   3. With `allowOnnxFallback: false` and no HTTP/SageMaker env, throw
  *      {@link EmbedderNotSetupError} — the ONNX binding is never loaded.
diff --git a/packages/embedder/src/fingerprint.test.ts b/packages/embedder/src/fingerprint.test.ts
index b31bceab..88a2baff 100644
--- a/packages/embedder/src/fingerprint.test.ts
+++ b/packages/embedder/src/fingerprint.test.ts
@@ -8,23 +8,19 @@ import { assertEmbedderCompatible, EMBEDDER_MISMATCH_HINT } from "./fingerprint.
 
 describe("assertEmbedderCompatible", () => {
   test("ok when persisted is undefined (legacy store, never tagged)", () => {
-    const result = assertEmbedderCompatible(undefined, "gte-modernbert-base/fp32", false);
+    const result = assertEmbedderCompatible(undefined, "f2llm-v2-80m/fp32", false);
     ok(result.ok);
   });
 
   test("ok when persisted equals current", () => {
-    const result = assertEmbedderCompatible(
-      "gte-modernbert-base/fp32",
-      "gte-modernbert-base/fp32",
-      false,
-    );
+    const result = assertEmbedderCompatible("f2llm-v2-80m/fp32", "f2llm-v2-80m/fp32", false);
     ok(result.ok);
   });
 
   test("ok when persisted differs from current but force is true", () => {
     const result = assertEmbedderCompatible(
-      "gte-modernbert-base/fp32",
-      "sagemaker:gte-modernbert-base@my-endpoint",
+      "f2llm-v2-80m/fp32",
+      "f2llm-v2-80m/sagemaker:my-endpoint",
       true,
     );
     ok(result.ok);
@@ -32,14 +28,14 @@ describe("assertEmbedderCompatible", () => {
 
   test("not ok when persisted differs from current and force is false", () => {
     const result = assertEmbedderCompatible(
-      "gte-modernbert-base/fp32",
-      "sagemaker:gte-modernbert-base@my-endpoint",
+      "f2llm-v2-80m/fp32",
+      "f2llm-v2-80m/sagemaker:my-endpoint",
       false,
     );
     ok(!result.ok);
     if (!result.ok) {
-      equal(result.persistedModelId, "gte-modernbert-base/fp32");
-      equal(result.currentModelId, "sagemaker:gte-modernbert-base@my-endpoint");
+      equal(result.persistedModelId, "f2llm-v2-80m/fp32");
+      equal(result.currentModelId, "f2llm-v2-80m/sagemaker:my-endpoint");
       equal(result.hint, EMBEDDER_MISMATCH_HINT);
     }
   });
diff --git a/packages/embedder/src/fingerprint.ts b/packages/embedder/src/fingerprint.ts
index 91d932b8..9444b2a1 100644
--- a/packages/embedder/src/fingerprint.ts
+++ b/packages/embedder/src/fingerprint.ts
@@ -3,10 +3,10 @@
  *
  * The `embeddings` table on disk was populated by ONE specific embedder
  * — usually identified by its {@link Embedder.modelId} (e.g.
- * `gte-modernbert-base/fp32`, `sagemaker:gte-modernbert-base@<endpoint>`).
+ * `f2llm-v2-80m/fp32`, `f2llm-v2-80m/sagemaker:<endpoint>`).
  * If the operator switches the active embedder between index runs (ONNX
- * → SageMaker, fp32 → int8) the dim might still match by coincidence
- * (768 = 768) but the vector subspace is different — hybrid search
+ * → SageMaker, fp32 → int8, or a different model entirely) the vector
+ * subspace differs even when the dim coincides — hybrid search
  * silently corrupts ranking with no error.
  *
  * `assertEmbedderCompatible` makes the mismatch loud:
diff --git a/packages/embedder/src/http-embedder.test.ts b/packages/embedder/src/http-embedder.test.ts
index 31f502aa..473721ae 100644
--- a/packages/embedder/src/http-embedder.test.ts
+++ b/packages/embedder/src/http-embedder.test.ts
@@ -3,7 +3,7 @@
  * {@link openEmbedder} factory.
  *
  * Coverage:
- *   - happy path: mock fetch returns a 768-d vector → Float32Array of 768
+ *   - happy path: mock fetch returns a 320-d vector → Float32Array of 320
  *   - retry on 5xx × 2, then succeed
  *   - retry on network error × 2, then succeed
  *   - empty endpointUrl → ONNX path chosen (factory falls through;
@@ -112,23 +112,23 @@ function makeFetchMockNetErrThenOk(
 
 describe("openHttpEmbedder: happy path", () => {
   it("returns a Float32Array of the expected dim on a 200 response", async () => {
-    const vec768 = Array.from({ length: 768 }, (_, i) => (i + 1) / 400);
-    const fetchImpl = makeFetchMockOk(vec768);
+    const vec320 = Array.from({ length: 320 }, (_, i) => (i + 1) / 400);
+    const fetchImpl = makeFetchMockOk(vec320);
     const embedder = openHttpEmbedder({
       endpointUrl: "https://embed.example/v1",
-      modelId: "gte-modernbert-base",
+      modelId: "f2llm-v2-80m",
       fetchImpl,
     });
     const out = await embedder.embed("hello world");
-    equal(out.length, 768);
-    equal(embedder.dim, 768);
-    equal(embedder.modelId, "gte-modernbert-base");
+    equal(out.length, 320);
+    equal(embedder.dim, 320);
+    equal(embedder.modelId, "f2llm-v2-80m");
     // Values round-trip as Float32 (so small precision loss is acceptable).
-    ok(Math.abs((out[0] ?? 0) - (vec768[0] ?? 0)) < 1e-6);
+    ok(Math.abs((out[0] ?? 0) - (vec320[0] ?? 0)) < 1e-6);
     await embedder.close();
   });
 
-  it("honours a caller-supplied `dims` value (non-768 remote)", async () => {
+  it("honours a caller-supplied `dims` value (non-320 remote)", async () => {
     const vec1024 = new Array<number>(1024).fill(0.125);
     const fetchImpl = makeFetchMockOk(vec1024);
     const embedder = openHttpEmbedder({
@@ -147,7 +147,7 @@ describe("openHttpEmbedder: happy path", () => {
     const fetchImpl: typeof fetch = async (_url, _init) => {
       call += 1;
       // Distinct vector per call so we can verify order.
-      const embedding = new Array<number>(768).fill(call / 100);
+      const embedding = new Array<number>(320).fill(call / 100);
       return new Response(JSON.stringify({ data: [{ embedding }] }), {
         status: 200,
         headers: { "content-type": "application/json" },
@@ -170,7 +170,7 @@ describe("openHttpEmbedder: happy path", () => {
     const fetchImpl: typeof fetch = async (url, _init) => {
       seen.push(String(url));
       return new Response(
-        JSON.stringify({ data: [{ embedding: new Array<number>(768).fill(0) }] }),
+        JSON.stringify({ data: [{ embedding: new Array<number>(320).fill(0) }] }),
         { status: 200, headers: { "content-type": "application/json" } },
       );
     };
@@ -189,7 +189,7 @@ describe("openHttpEmbedder: happy path", () => {
     const fetchImpl: typeof fetch = async (url, _init) => {
       seen.push(String(url));
       return new Response(
-        JSON.stringify({ data: [{ embedding: new Array<number>(768).fill(0) }] }),
+        JSON.stringify({ data: [{ embedding: new Array<number>(320).fill(0) }] }),
         { status: 200, headers: { "content-type": "application/json" } },
       );
     };
@@ -205,7 +205,7 @@ describe("openHttpEmbedder: happy path", () => {
 
 describe("openHttpEmbedder: retries", () => {
   it("retries on 5xx and succeeds on the third attempt", async () => {
-    const embedding = new Array<number>(768).fill(0.1);
+    const embedding = new Array<number>(320).fill(0.1);
     const seq = makeFetchMockSeq([
       { status: 500, body: { error: "bad" } },
       { status: 503, body: { error: "busy" } },
@@ -217,12 +217,12 @@ describe("openHttpEmbedder: retries", () => {
       fetchImpl: seq.fetchImpl,
     });
     const out = await embedder.embed("x");
-    equal(out.length, 768);
+    equal(out.length, 320);
     equal(seq.calls(), 3, "must have retried twice before succeeding");
   });
 
   it("retries on 429 (rate limit) and succeeds on the third attempt", async () => {
-    const embedding = new Array<number>(768).fill(0.2);
+    const embedding = new Array<number>(320).fill(0.2);
     const seq = makeFetchMockSeq([
       { status: 429, body: { error: "rate" } },
       { status: 429, body: { error: "rate" } },
@@ -234,7 +234,7 @@ describe("openHttpEmbedder: retries", () => {
       fetchImpl: seq.fetchImpl,
     });
     const out = await embedder.embed("x");
-    equal(out.length, 768);
+    equal(out.length, 320);
     equal(seq.calls(), 3);
   });
 
@@ -250,14 +250,14 @@ describe("openHttpEmbedder: retries", () => {
   });
 
   it("retries on a thrown network error and succeeds on the third attempt", async () => {
-    const seq = makeFetchMockNetErrThenOk(2, new Array<number>(768).fill(0));
+    const seq = makeFetchMockNetErrThenOk(2, new Array<number>(320).fill(0));
     const embedder = openHttpEmbedder({
       endpointUrl: "https://embed.example/v1",
       modelId: "m",
       fetchImpl: seq.fetchImpl,
     });
     const out = await embedder.embed("x");
-    equal(out.length, 768);
+    equal(out.length, 320);
     equal(seq.calls(), 3);
   });
 
@@ -284,27 +284,27 @@ describe("openHttpEmbedder: dim mismatch guard", () => {
     const embedder = openHttpEmbedder({
       endpointUrl: "https://embed.example/v1",
       modelId: "m",
-      dims: 768,
+      dims: 320,
       fetchImpl: makeFetchMockOk(wrong),
     });
     await rejects(embedder.embed("x"), (err: unknown) => {
       ok(err instanceof Error);
       match(err.message, /Embedding dimension mismatch/);
       match(err.message, /1024d vector/);
-      match(err.message, /expected 768d/);
+      match(err.message, /expected 320d/);
       match(err.message, /CODEHUB_EMBEDDING_DIMS/);
       return true;
     });
   });
 
-  it("uses 768 as the default expected dim when `dims` is omitted", async () => {
+  it("uses 320 as the default expected dim when `dims` is omitted", async () => {
     const wrong = new Array<number>(1024).fill(0);
     const embedder = openHttpEmbedder({
       endpointUrl: "https://embed.example/v1",
       modelId: "m",
       fetchImpl: makeFetchMockOk(wrong),
     });
-    await rejects(embedder.embed("x"), /expected 768d/);
+    await rejects(embedder.embed("x"), /expected 320d/);
   });
 });
 
@@ -315,7 +315,7 @@ describe("openHttpEmbedder: auth header", () => {
       const headers = new Headers(init?.headers);
       seenAuth = headers.get("authorization") ?? undefined;
       return new Response(
-        JSON.stringify({ data: [{ embedding: new Array<number>(768).fill(0) }] }),
+        JSON.stringify({ data: [{ embedding: new Array<number>(320).fill(0) }] }),
         { status: 200, headers: { "content-type": "application/json" } },
       );
     };
@@ -335,7 +335,7 @@ describe("openHttpEmbedder: auth header", () => {
       const headers = new Headers(init?.headers);
       seenAuth = headers.get("authorization") ?? undefined;
       return new Response(
-        JSON.stringify({ data: [{ embedding: new Array<number>(768).fill(0) }] }),
+        JSON.stringify({ data: [{ embedding: new Array<number>(320).fill(0) }] }),
         { status: 200, headers: { "content-type": "application/json" } },
       );
     };
@@ -461,10 +461,10 @@ describe("openEmbedder factory", () => {
     const embedder = await openEmbedder({
       endpointUrl: "https://embed.example/v1",
       modelId: "m",
-      fetchImpl: makeFetchMockOk(new Array<number>(768).fill(0.5)),
+      fetchImpl: makeFetchMockOk(new Array<number>(320).fill(0.5)),
     });
     const out = await embedder.embed("x");
-    equal(out.length, 768);
+    equal(out.length, 320);
   });
 
   it("throws when offline=true AND endpointUrl is set", async () => {
@@ -526,11 +526,11 @@ describe("tryOpenHttpEmbedder", () => {
   it("returns an Embedder when env is configured", async () => {
     process.env["CODEHUB_EMBEDDING_URL"] = "https://embed.example/v1";
     process.env["CODEHUB_EMBEDDING_MODEL"] = "m";
-    const fetchImpl = makeFetchMockOk(new Array<number>(768).fill(0));
+    const fetchImpl = makeFetchMockOk(new Array<number>(320).fill(0));
     const embedder = await tryOpenHttpEmbedder({ fetchImpl });
     ok(embedder !== null);
     const out = await embedder.embed("x");
-    equal(out.length, 768);
+    equal(out.length, 320);
   });
 
   it("throws when offline AND env is configured", () => {
@@ -550,7 +550,7 @@ describe("Embedder contract via HTTP", () => {
     const embedder = openHttpEmbedder({
       endpointUrl: "https://embed.example/v1",
       modelId: "m",
-      fetchImpl: makeFetchMockOk(new Array<number>(768).fill(0)),
+      fetchImpl: makeFetchMockOk(new Array<number>(320).fill(0)),
     });
     await embedder.close();
     await embedder.close();
diff --git a/packages/embedder/src/http-embedder.ts b/packages/embedder/src/http-embedder.ts
index 2fba772b..3a6c278a 100644
--- a/packages/embedder/src/http-embedder.ts
+++ b/packages/embedder/src/http-embedder.ts
@@ -42,7 +42,7 @@ export interface HttpEmbedderConfig {
   /** Model id sent in the `model` field of the request body. */
   readonly modelId: string;
   /**
-   * Expected response-vector dimension. Defaults to 768 (gte-modernbert-base).
+   * Expected response-vector dimension. Defaults to 320 (F2LLM-v2-80M).
    * Every response is asserted against this so a remote model swap can
    * never silently pollute downstream vector indexes.
    */
@@ -60,8 +60,8 @@ export interface HttpEmbedderConfig {
   readonly fetchImpl?: typeof fetch;
 }
 
-/** Default dim for gte-modernbert-base (the fallback when env doesn't set it). */
-const DEFAULT_DIMS = 768;
+/** Default dim for F2LLM-v2-80M (the fallback when env doesn't set it). */
+const DEFAULT_DIMS = 320;
 
 /**
  * Read HTTP embedder config from the process environment. Returns `null`
@@ -228,6 +228,11 @@ export function openHttpEmbedder(cfg: HttpEmbedderConfig): Embedder {
     dim: dims,
     modelId,
     embed: embedOne,
+    // Remote backends are text-in/vector-out and own their pooling/prefix
+    // server-side; the local F2LLM query-prefix asymmetry is not applied
+    // here (a remote F2LLM endpoint must handle it itself). Alias query to
+    // document so the interface contract holds.
+    embedQuery: embedOne,
     async embedBatch(texts: readonly string[]): Promise<readonly Float32Array[]> {
       if (texts.length === 0) return [];
       // One request per text. The HTTP surface supports batched `input`, but
diff --git a/packages/embedder/src/index.ts b/packages/embedder/src/index.ts
index bc198a61..4bd9c54b 100644
--- a/packages/embedder/src/index.ts
+++ b/packages/embedder/src/index.ts
@@ -9,7 +9,7 @@
  *     that POSTs to an OpenAI-compatible `/v1/embeddings` server (Infinity,
  *     vLLM, TEI, Ollama, LM Studio, OpenAI).
  *   - When neither is set, {@link openEmbedder} falls back to the local
- *     ONNX gte-modernbert-base path (deterministic embedder).
+ *     ONNX F2LLM-v2-80M path (deterministic embedder).
  *
  * Offline invariant: when `offline === true` and any remote option
  * (SageMaker or `endpointUrl`) is set, {@link openEmbedder} throws. Remote
@@ -47,8 +47,8 @@ export {
 } from "./http-embedder.js";
 export {
   embedderModelId,
-  GTE_MODERNBERT_BASE_PINS,
-  GTE_MODERNBERT_BASE_REPO,
+  F2LLM_V2_80M_PINS,
+  F2LLM_V2_80M_REPO,
   type PinnedFile,
   type VariantPins,
 } from "./model-pins.js";
@@ -59,6 +59,7 @@ export {
   resolveModelDir,
   TOKENIZER_FILES,
 } from "./paths.js";
+export { buildQueryText, F2LLM_QUERY_INSTRUCTION } from "./query-prefix.js";
 export {
   openSagemakerEmbedder,
   readSagemakerEmbedderConfigFromEnv,
@@ -103,7 +104,7 @@ export interface OpenEmbedderOptions {
   readonly modelId?: string;
   /** Bearer token for the HTTP request. Optional; sent as `unused` when absent. */
   readonly apiKey?: string;
-  /** Expected response-vector dimension. Defaults to 768 for HTTP/SageMaker. */
+  /** Expected response-vector dimension. Defaults to 320 for HTTP/SageMaker. */
   readonly dims?: number;
   /**
    * Pass-through options for the ONNX backend when a remote backend is
diff --git a/packages/embedder/src/model-pins.test.ts b/packages/embedder/src/model-pins.test.ts
index a6f4e44a..dbaddeaf 100644
--- a/packages/embedder/src/model-pins.test.ts
+++ b/packages/embedder/src/model-pins.test.ts
@@ -7,52 +7,80 @@
 import { equal, match, ok } from "node:assert/strict";
 import { describe, it } from "node:test";
 
-import {
-  embedderModelId,
-  GTE_MODERNBERT_BASE_PINS,
-  GTE_MODERNBERT_BASE_REPO,
-} from "./model-pins.js";
+import { embedderModelId, F2LLM_V2_80M_PINS, F2LLM_V2_80M_REPO } from "./model-pins.js";
 
 const SHA256_RE = /^[0-9a-f]{64}$/;
-const HF_URL_RE = new RegExp(
-  `^https://huggingface\\.co/Alibaba-NLP/gte-modernbert-base/resolve/${GTE_MODERNBERT_BASE_REPO.commit}/`,
+// The exported ONNX + tokenizer artifacts are hosted as GitHub release assets
+// on the opencodehub repo (NOT upstream Hugging Face — the export bakes
+// pooling + L2 norm into the graph and does not exist upstream).
+const RELEASE_URL_RE = new RegExp(
+  `^https://github\\.com/theagenticguy/opencodehub/releases/download/${F2LLM_V2_80M_REPO.release}/`,
 );
 
 describe("model-pins", () => {
-  it("repo metadata is Apache-2.0 and pins a commit SHA", () => {
-    equal(GTE_MODERNBERT_BASE_REPO.license, "Apache-2.0");
-    equal(GTE_MODERNBERT_BASE_REPO.hfRepo, "Alibaba-NLP/gte-modernbert-base");
-    match(GTE_MODERNBERT_BASE_REPO.commit, /^[0-9a-f]{40}$/);
+  it("repo metadata is Apache-2.0 and attributes the upstream + release", () => {
+    equal(F2LLM_V2_80M_REPO.license, "Apache-2.0");
+    equal(F2LLM_V2_80M_REPO.upstream, "codefuse-ai/F2LLM-v2-80M");
+    equal(F2LLM_V2_80M_REPO.release, "embed-v1");
   });
 
-  it("fp32 variant ships one ONNX + four tokenizer files", () => {
-    const names = GTE_MODERNBERT_BASE_PINS.fp32.files.map((f) => f.name);
-    equal(GTE_MODERNBERT_BASE_PINS.fp32.files.length, 5);
+  it("fp32 variant ships one ONNX + two tokenizer files", () => {
+    const names = F2LLM_V2_80M_PINS.fp32.files.map((f) => f.name);
+    equal(F2LLM_V2_80M_PINS.fp32.files.length, 3);
     ok(names.includes("model.onnx"));
     ok(names.includes("tokenizer.json"));
     ok(names.includes("tokenizer_config.json"));
-    ok(names.includes("config.json"));
-    ok(names.includes("special_tokens_map.json"));
+    // The export omits config.json / special_tokens_map.json — pooling + norm
+    // are in-graph, so they are not fetched.
+    ok(!names.includes("config.json"));
+    ok(!names.includes("special_tokens_map.json"));
   });
 
   it("int8 variant swaps the ONNX file and reuses tokenizer pins", () => {
-    const names = GTE_MODERNBERT_BASE_PINS.int8.files.map((f) => f.name);
+    const names = F2LLM_V2_80M_PINS.int8.files.map((f) => f.name);
+    equal(F2LLM_V2_80M_PINS.int8.files.length, 3);
     ok(names.includes("model_int8.onnx"));
     ok(!names.includes("model.onnx"));
+    ok(names.includes("tokenizer.json"));
+    ok(names.includes("tokenizer_config.json"));
   });
 
-  it("every pinned file has a 64-char sha256 and HF resolve URL", () => {
+  it("every pinned file has a 64-char sha256 and GitHub release URL", () => {
     for (const variant of ["fp32", "int8"] as const) {
-      for (const f of GTE_MODERNBERT_BASE_PINS[variant].files) {
+      for (const f of F2LLM_V2_80M_PINS[variant].files) {
         match(f.sha256, SHA256_RE, `${variant}/${f.name} sha256`);
-        match(f.url, HF_URL_RE, `${variant}/${f.name} url`);
+        match(f.url, RELEASE_URL_RE, `${variant}/${f.name} url`);
         ok(f.sizeBytes > 0, `${variant}/${f.name} sizeBytes`);
       }
     }
   });
 
+  it("pins the exact fp32 model + tokenizer sizes and hashes", () => {
+    const model = F2LLM_V2_80M_PINS.fp32.files.find((f) => f.name === "model.onnx");
+    ok(model !== undefined);
+    equal(model.sizeBytes, 320708733);
+    equal(model.sha256, "9347f761e1420e61c477b56616b3b4f2d2ee80d94747fd6cdde9a03b4c9176bc");
+
+    const tok = F2LLM_V2_80M_PINS.fp32.files.find((f) => f.name === "tokenizer.json");
+    ok(tok !== undefined);
+    equal(tok.sizeBytes, 11423359);
+    equal(tok.sha256, "7dd49a6a008054ecbf11f1568ea9244e99ca8a44fe47e883d1bb9915c3042705");
+
+    const tokCfg = F2LLM_V2_80M_PINS.fp32.files.find((f) => f.name === "tokenizer_config.json");
+    ok(tokCfg !== undefined);
+    equal(tokCfg.sizeBytes, 378);
+    equal(tokCfg.sha256, "3dbc087db36f09c0c359618cbfcebb4b3aed6d8438951c037789b5a0fdc099af");
+  });
+
+  it("pins the exact int8 model size and hash", () => {
+    const model = F2LLM_V2_80M_PINS.int8.files.find((f) => f.name === "model_int8.onnx");
+    ok(model !== undefined);
+    equal(model.sizeBytes, 80699171);
+    equal(model.sha256, "302845905e9273a1dd0fb4c670dcd12d16ad35e9522f518aa45a74da4d6ec5b8");
+  });
+
   it("embedderModelId produces the string used by the storage layer", () => {
-    equal(embedderModelId("fp32"), "gte-modernbert-base/fp32");
-    equal(embedderModelId("int8"), "gte-modernbert-base/int8");
+    equal(embedderModelId("fp32"), "f2llm-v2-80m/fp32");
+    equal(embedderModelId("int8"), "f2llm-v2-80m/int8");
   });
 });
diff --git a/packages/embedder/src/model-pins.ts b/packages/embedder/src/model-pins.ts
index fa2474af..56bab6bb 100644
--- a/packages/embedder/src/model-pins.ts
+++ b/packages/embedder/src/model-pins.ts
@@ -1,18 +1,23 @@
 /**
- * SHA256 and source-URL pins for every gte-modernbert-base weight file we ship.
+ * SHA256 and source-URL pins for every F2LLM-v2-80M weight file we ship.
  *
  * These pins are the authoritative contract consumed by `codehub setup
  * --embeddings` and by `codehub doctor` at runtime. SHA256 values were
- * computed locally against the Hugging Face model repo at commit
- * `e7f32e3c00f91d699e8c43b53106206bcc72bb22` on 2026-04-25.
+ * computed locally against the ONNX export produced from
+ * `codefuse-ai/F2LLM-v2-80M` (a Qwen3-0.6B-Base derivative) — the export
+ * bakes last-token pooling + L2 normalization into the graph, so it is NOT
+ * the upstream Hugging Face repo's own files. We host the exported
+ * artifacts as a GitHub release asset and pin them by URL + SHA256.
  *
  * This module does NOT download anything on its own. It is pure data.
  */
 
-/** HF repo + commit the pins are anchored to. */
-export const GTE_MODERNBERT_BASE_REPO = {
-  hfRepo: "Alibaba-NLP/gte-modernbert-base",
-  commit: "e7f32e3c00f91d699e8c43b53106206bcc72bb22",
+/** Source repo + release the pins are anchored to. */
+export const F2LLM_V2_80M_REPO = {
+  /** Upstream model the ONNX export is derived from (attribution). */
+  upstream: "codefuse-ai/F2LLM-v2-80M",
+  /** GitHub release tag hosting the exported ONNX + tokenizer artifacts. */
+  release: "embed-v1",
   license: "Apache-2.0",
 } as const;
 
@@ -30,46 +35,44 @@ export interface VariantPins {
   readonly files: readonly PinnedFile[];
 }
 
-function hfUrl(path: string): string {
-  return `https://huggingface.co/${GTE_MODERNBERT_BASE_REPO.hfRepo}/resolve/${GTE_MODERNBERT_BASE_REPO.commit}/${path}`;
+/**
+ * Build the download URL for a release asset. The exported ONNX files do
+ * not exist upstream on Hugging Face — they are attached to a GitHub
+ * release on the opencodehub repo. Asset names are flat (no directory),
+ * so the int8 weights are uploaded as `model_int8.onnx` etc.
+ */
+function releaseUrl(asset: string): string {
+  return `https://github.com/theagenticguy/opencodehub/releases/download/${F2LLM_V2_80M_REPO.release}/${asset}`;
 }
 
 // Tokenizer + config files are identical across variants — hashes computed
-// once from the model repo.
+// once from the exported artifacts.
 const TOKENIZER_JSON: PinnedFile = {
   name: "tokenizer.json",
-  url: hfUrl("tokenizer.json"),
-  sizeBytes: 3583228,
-  sha256: "6c8aaa9a542084f2457eab775d4eeb51f92a70c0fd9de28d5edb0ddec3c08d30",
+  url: releaseUrl("tokenizer.json"),
+  sizeBytes: 11423359,
+  sha256: "7dd49a6a008054ecbf11f1568ea9244e99ca8a44fe47e883d1bb9915c3042705",
 };
 
 const TOKENIZER_CONFIG_JSON: PinnedFile = {
   name: "tokenizer_config.json",
-  url: hfUrl("tokenizer_config.json"),
-  sizeBytes: 20867,
-  sha256: "9654072f7c873161814043cf08cb5ed72f71d0b935abcd4e267935cb34352c21",
-};
-
-const CONFIG_JSON: PinnedFile = {
-  name: "config.json",
-  url: hfUrl("config.json"),
-  sizeBytes: 1184,
-  sha256: "8ba54dc3d35d7194f5178a4194b649f146753e02dabd22bdca5c5cbac15069ed",
-};
-
-const SPECIAL_TOKENS_MAP_JSON: PinnedFile = {
-  name: "special_tokens_map.json",
-  url: hfUrl("special_tokens_map.json"),
-  sizeBytes: 694,
-  sha256: "ea97ecdbcc73713039d8d64dbb05e3689495c96657fbd9a18f5bed381be81049",
+  url: releaseUrl("tokenizer_config.json"),
+  sizeBytes: 378,
+  sha256: "3dbc087db36f09c0c359618cbfcebb4b3aed6d8438951c037789b5a0fdc099af",
 };
 
 /**
- * Per-variant manifests. The fp32 variant is the default (596 MB, highest
- * precision); int8 is 4× smaller (150 MB) with near-identical retrieval
- * quality for size-constrained installs.
+ * Per-variant manifests. The fp32 variant is the default (321 MB,
+ * cosine-exact 1.0 vs the PyTorch reference, byte-deterministic under the
+ * single-thread WASM gate); int8 is 4× smaller (81 MB) with 4/4 top-1
+ * ranking agreement for size-constrained installs.
+ *
+ * F2LLM emits a single graph output named `embedding` of shape
+ * `[batch, 320]` — pooling + L2 norm are in-graph, so only the ONNX file
+ * + the two tokenizer files are required (no config.json /
+ * special_tokens_map.json, which the export omits).
  */
-export const GTE_MODERNBERT_BASE_PINS: {
+export const F2LLM_V2_80M_PINS: {
   readonly fp32: VariantPins;
   readonly int8: VariantPins;
 } = {
@@ -78,14 +81,12 @@ export const GTE_MODERNBERT_BASE_PINS: {
     files: [
       {
         name: "model.onnx",
-        url: hfUrl("onnx/model.onnx"),
-        sizeBytes: 596392315,
-        sha256: "947f31df7effaeec4edb57c50e4ed7e0f2034d9336063f92615b92e3e0d24d78",
+        url: releaseUrl("model.onnx"),
+        sizeBytes: 320708733,
+        sha256: "9347f761e1420e61c477b56616b3b4f2d2ee80d94747fd6cdde9a03b4c9176bc",
       },
       TOKENIZER_JSON,
       TOKENIZER_CONFIG_JSON,
-      CONFIG_JSON,
-      SPECIAL_TOKENS_MAP_JSON,
     ],
   },
   int8: {
@@ -93,19 +94,17 @@ export const GTE_MODERNBERT_BASE_PINS: {
     files: [
       {
         name: "model_int8.onnx",
-        url: hfUrl("onnx/model_int8.onnx"),
-        sizeBytes: 150218016,
-        sha256: "bae96b276d342bf86eeee07c1bdbc0c75bb82bf4033941aab7fabc1e33ee3b44",
+        url: releaseUrl("model_int8.onnx"),
+        sizeBytes: 80699171,
+        sha256: "302845905e9273a1dd0fb4c670dcd12d16ad35e9522f518aa45a74da4d6ec5b8",
       },
       TOKENIZER_JSON,
       TOKENIZER_CONFIG_JSON,
-      CONFIG_JSON,
-      SPECIAL_TOKENS_MAP_JSON,
     ],
   },
 } as const;
 
-/** Model id tag written into `embeddings.model` (keeps HNSW indexes separable). */
+/** Model id tag written into `embeddings.model` (keeps vector indexes separable). */
 export function embedderModelId(variant: "fp32" | "int8"): string {
-  return `gte-modernbert-base/${variant}`;
+  return `f2llm-v2-80m/${variant}`;
 }
diff --git a/packages/embedder/src/onnx-embedder.test.ts b/packages/embedder/src/onnx-embedder.test.ts
index d54ca721..d9b3e53f 100644
--- a/packages/embedder/src/onnx-embedder.test.ts
+++ b/packages/embedder/src/onnx-embedder.test.ts
@@ -6,11 +6,17 @@
  *      `code` literal. Guarantees the CLI and search layer can pattern-match
  *      the error to degrade to BM25-only.
  *   2. Real weights present → byte-identical output across three repeat
- *      calls + L2 norm ≈ 1 + dim === 768. Only runs when the cache dir is
- *      populated. CI does NOT populate this dir.
+ *      calls + dim === 320. Only runs when the cache dir is populated. CI
+ *      does NOT populate this dir.
+ *
+ * F2LLM-v2-80M's ONNX graph bakes last-token pooling + L2 normalization in,
+ * emitting a single 320-dim output named `embedding` already unit-length —
+ * so the embedder does NO JS-side pooling or normalization (unlike the prior
+ * gte-modernbert CLS-pool path). The unit-norm assertion below therefore
+ * checks the model contract, not a JS post-step.
  *
  * When weights are absent we also run a mock-based check of the Embedder
- * contract (dim=768, embedBatch preserves input order, close() is
+ * contract (dim=320, embedBatch preserves input order, close() is
  * idempotent) so the interface is covered unconditionally.
  */
 
@@ -74,11 +80,13 @@ describe("openOnnxEmbedder: missing weights", () => {
 /**
  * A hand-rolled `Embedder` used when real weights are unavailable. Its
  * `embed` produces a deterministic fake vector (index-based) so we can still
- * exercise the downstream contract: L2 norm ≈ 1, dim=768, embedBatch
- * preserves order, close() is idempotent.
+ * exercise the downstream contract: dim=320, embedBatch preserves order,
+ * close() is idempotent, repeat calls are byte-identical. The fake happens to
+ * return unit vectors, but that is a property of the mock — the real embedder
+ * gets unit length from the in-graph L2 norm, not from any JS step.
  */
 class MockEmbedder implements Embedder {
-  readonly dim = 768;
+  readonly dim = 320;
   readonly modelId = embedderModelId("fp32");
   #closed = false;
 
@@ -107,6 +115,12 @@ class MockEmbedder implements Embedder {
     return vec;
   }
 
+  // F2LLM is asymmetric (query gets an Instruct: prefix) but the mock has no
+  // real model, so it aliases the query path to the document path.
+  async embedQuery(text: string): Promise<Float32Array> {
+    return this.embed(text);
+  }
+
   async embedBatch(texts: readonly string[]): Promise<readonly Float32Array[]> {
     return Promise.all(texts.map((t) => this.embed(t)));
   }
@@ -130,17 +144,18 @@ describe("Embedder contract (mocked)", () => {
     const m = new MockEmbedder();
     // Static type check: `m satisfies Embedder` is enforced by the class
     // declaration. Here we re-check at runtime.
-    equal(m.dim, 768);
-    equal(m.modelId, "gte-modernbert-base/fp32");
+    equal(m.dim, 320);
+    equal(m.modelId, "f2llm-v2-80m/fp32");
     equal(typeof m.embed, "function");
+    equal(typeof m.embedQuery, "function");
     equal(typeof m.embedBatch, "function");
     equal(typeof m.close, "function");
   });
 
-  it("dim === 768", async () => {
+  it("dim === 320", async () => {
     const m = new MockEmbedder();
     const v = await m.embed("hello world");
-    equal(v.length, 768);
+    equal(v.length, 320);
   });
 
   it("L2 norm is ~1 (within 1e-6)", async () => {
@@ -199,9 +214,9 @@ async function hasRealWeights(): Promise<boolean> {
 }
 
 describe("OnnxEmbedder: real weights (optional)", () => {
-  it("produces byte-identical vectors across 3 calls and has dim=768", async (t) => {
+  it("produces byte-identical vectors across 3 calls and has dim=320", async (t) => {
     if (!(await hasRealWeights())) {
-      t.skip("gte-modernbert-base weights not installed — run `codehub setup --embeddings`");
+      t.skip("f2llm-v2-80m weights not installed — run `codehub setup --embeddings`");
       return;
     }
     let embedder: Embedder | undefined;
@@ -211,12 +226,14 @@ describe("OnnxEmbedder: real weights (optional)", () => {
       const a = await embedder.embed(text);
       const b = await embedder.embed(text);
       const c = await embedder.embed(text);
-      equal(a.length, 768);
-      equal(embedder.dim, 768);
-      equal(embedder.modelId, "gte-modernbert-base/fp32");
+      equal(a.length, 320);
+      equal(embedder.dim, 320);
+      equal(embedder.modelId, "f2llm-v2-80m/fp32");
       deepEqual(new Uint8Array(a.buffer), new Uint8Array(b.buffer));
       deepEqual(new Uint8Array(a.buffer), new Uint8Array(c.buffer));
 
+      // F2LLM's graph L2-normalizes its output, so the vector is unit-length
+      // straight from `embedding` — no JS normalize step is involved.
       const n = l2Norm(a);
       ok(Math.abs(n - 1) < 1e-4, `expected unit norm, got ${n}`);
     } finally {
diff --git a/packages/embedder/src/onnx-embedder.ts b/packages/embedder/src/onnx-embedder.ts
index 30248786..04685401 100644
--- a/packages/embedder/src/onnx-embedder.ts
+++ b/packages/embedder/src/onnx-embedder.ts
@@ -1,12 +1,25 @@
 /**
- * Deterministic ONNX-based embedder for Alibaba gte-modernbert-base.
+ * Deterministic ONNX-based embedder for codefuse-ai F2LLM-v2-80M.
  *
  * Loads weights from disk (populated by `codehub setup --embeddings`), runs
- * inference with every nondeterminism knob disabled, and emits a 768-dim
+ * inference with every nondeterminism knob disabled, and emits a 320-dim
  * Float32Array per input. The same input MUST produce byte-identical output
  * across repeat calls; this is the contract the graphHash CI gate relies
  * on.
  *
+ * F2LLM-v2-80M is a Qwen3-0.6B-Base derivative (8 layers, hidden 320, 16
+ * heads / 8 KV heads). The ONNX export bakes last-token pooling
+ * (`attention_mask.sum()-1`) AND L2 normalization INTO the graph, emitting
+ * a single output named `embedding` of shape `[batch, 320]` already
+ * unit-length — so this module does NO JS-side pooling or normalization,
+ * unlike the previous gte-modernbert (CLS-pool) path.
+ *
+ * Query/document asymmetry: F2LLM expects an `Instruct:`-wrapped prefix on
+ * QUERY text only; documents are embedded raw. {@link OnnxEmbedder.embed}
+ * /`embedBatch` embed raw text (the document path); {@link
+ * OnnxEmbedder.embedQuery} applies the prefix (the query path). See
+ * {@link buildQueryText}.
+ *
  * The weights themselves are NOT downloaded here — `codehub setup
  * --embeddings` owns that code path. If the weights are absent we throw
  * {@link EmbedderNotSetupError} so callers can degrade to BM25-only search
@@ -28,13 +41,17 @@ import type { InferenceSession, Tensor } from "onnxruntime-web";
 
 import { embedderModelId } from "./model-pins.js";
 import { modelFileName, resolveModelDir, TOKENIZER_FILES } from "./paths.js";
+import { buildQueryText } from "./query-prefix.js";
 import { type Embedder, type EmbedderConfig, EmbedderNotSetupError } from "./types.js";
 
-// gte-modernbert-base is a ModernBERT-base encoder (22 layers, 12 heads,
-// 768 hidden). These numbers are part of the model contract, not a config
-// knob — do not expose to callers.
-const EMBED_DIM = 768;
-const MODEL_MAX_POSITION = 8192; // ModernBERT's position embedding table
+// F2LLM-v2-80M emits a single graph output named `embedding`, shape
+// `[batch, 320]`, already L2-normalized. These numbers are part of the
+// model contract, not a config knob — do not expose to callers.
+const EMBED_DIM = 320;
+// Practical truncation cap in tokens. F2LLM's model_max_length is 131072,
+// but code symbols are short and a large cap wastes memory/latency; 8192 is
+// the operative ceiling.
+const MODEL_MAX_LENGTH = 8192;
 
 async function fileExists(path: string): Promise<boolean> {
   try {
@@ -61,7 +78,7 @@ async function assertModelFiles(
   }
   if (missing.length > 0) {
     throw new EmbedderNotSetupError(
-      `gte-modernbert-base weights not found in ${modelDir}: ` +
+      `F2LLM-v2-80M weights not found in ${modelDir}: ` +
         `missing ${missing.join(", ")}. ` +
         `Run \`codehub setup --embeddings\` while online.`,
     );
@@ -110,7 +127,10 @@ function buildSessionOptions(): InferenceSession.SessionOptions {
 /**
  * Encode `text` using the supplied Tokenizer and produce padded/truncated
  * input_ids and attention_mask arrays. BigInt64Array matches the model's
- * int64 input type. ModernBERT has no token_type_ids input.
+ * int64 input type. Qwen3/F2LLM has no token_type_ids input.
+ *
+ * `add_special_tokens: true` is REQUIRED — the tokenizer's TemplateProcessing
+ * appends the EOS (`<|im_end|>`) that the in-graph last-token pooling reads.
  */
 function encodeForModel(
   tokenizer: Tokenizer,
@@ -124,7 +144,9 @@ function encodeForModel(
   const enc = tokenizer.encode(text, {
     add_special_tokens: true,
   });
-  // Truncate to the model's max_position_embeddings.
+  // Truncate to the practical max length. On truncation the trailing EOS is
+  // dropped and last-token pooling reads the final retained token — a valid
+  // (if degraded) representation of the truncated prefix.
   const ids = enc.ids.slice(0, maxModelLength);
   const mask = enc.attention_mask.slice(0, maxModelLength);
 
@@ -139,11 +161,12 @@ function encodeForModel(
 }
 
 /**
- * Pad two parallel BigInt64Arrays (ids, mask) up to `padTo`. ModernBERT's
- * pad_token_id is 50283 (not 0 as in BERT); the attention mask is 0 for
- * padding positions so the model ignores them regardless.
+ * Pad two parallel BigInt64Arrays (ids, mask) up to `padTo`. F2LLM's
+ * tokenizer pad_token is `<|endoftext|>` (id 151643); the attention mask is
+ * 0 for padding positions so the in-graph last-token pooling
+ * (`attention_mask.sum()-1`) skips them regardless of the pad id used.
  */
-const MODERNBERT_PAD_ID = 50283n;
+const F2LLM_PAD_ID = 151643n;
 
 function padToLength(
   ids: BigInt64Array,
@@ -156,57 +179,13 @@ function padToLength(
   if (ids.length === padTo) {
     return { ids, mask };
   }
-  const outIds = new BigInt64Array(padTo).fill(MODERNBERT_PAD_ID);
+  const outIds = new BigInt64Array(padTo).fill(F2LLM_PAD_ID);
   const outMask = new BigInt64Array(padTo);
   outIds.set(ids);
   outMask.set(mask);
   return { ids: outIds, mask: outMask };
 }
 
-/**
- * Extract the [CLS] vector (index 0 of last_hidden_state) for batch item
- * `rowIdx`. gte-modernbert-base ships `1_Pooling/config.json` with
- * `pooling_mode_cls_token: true`, so we grab the first-token hidden state and
- * L2-normalize it downstream.
- */
-function clsPool(
-  lastHidden: Float32Array,
-  rowIdx: number,
-  seqLen: number,
-  hiddenSize: number,
-): Float32Array {
-  const rowStart = rowIdx * seqLen * hiddenSize;
-  const out = new Float32Array(hiddenSize);
-  for (let i = 0; i < hiddenSize; i++) {
-    out[i] = lastHidden[rowStart + i] ?? 0;
-  }
-  return out;
-}
-
-/**
- * In-place L2 normalization with Kahan-summed squared norm for 2-ULP tighter
- * precision than naive sum. Single division by `sqrt(norm)` keeps the op
- * deterministic across x86_64 + aarch64 (IEEE-754 round-to-nearest-even).
- */
-function l2NormalizeInPlace(vec: Float32Array): void {
-  let sum = 0;
-  let comp = 0; // Kahan compensator
-  for (let i = 0; i < vec.length; i++) {
-    const v = vec[i] ?? 0;
-    const term = v * v - comp;
-    const t = sum + term;
-    comp = t - sum - term;
-    sum = t;
-  }
-  if (sum <= 0) {
-    return;
-  }
-  const inv = 1 / Math.sqrt(sum);
-  for (let i = 0; i < vec.length; i++) {
-    vec[i] = (vec[i] ?? 0) * inv;
-  }
-}
-
 /** Internal implementation — exported only via the {@link Embedder} seam. */
 class OnnxEmbedder implements Embedder {
   readonly dim = EMBED_DIM;
@@ -214,10 +193,9 @@ class OnnxEmbedder implements Embedder {
 
   readonly #session: InferenceSession;
   readonly #tokenizer: Tokenizer;
-  readonly #normalize: boolean;
   readonly #maxModelLength: number;
   // Runtime `Tensor` constructor, threaded in from the dynamic
-  // `import("onnxruntime-node")` so this module never statically loads the
+  // `import("onnxruntime-web")` so this module never statically loads the
   // native binding.
   readonly #Tensor: typeof Tensor;
   #closed = false;
@@ -226,18 +204,17 @@ class OnnxEmbedder implements Embedder {
     readonly session: InferenceSession;
     readonly tokenizer: Tokenizer;
     readonly variant: "fp32" | "int8";
-    readonly normalize: boolean;
     readonly maxModelLength: number;
     readonly Tensor: typeof Tensor;
   }) {
     this.#session = params.session;
     this.#tokenizer = params.tokenizer;
     this.modelId = embedderModelId(params.variant);
-    this.#normalize = params.normalize;
     this.#maxModelLength = params.maxModelLength;
     this.#Tensor = params.Tensor;
   }
 
+  /** Embed a single DOCUMENT (no query prefix). */
   async embed(text: string): Promise<Float32Array> {
     this.#ensureOpen();
     const [vec] = await this.embedBatch([text]);
@@ -247,6 +224,17 @@ class OnnxEmbedder implements Embedder {
     return vec;
   }
 
+  /**
+   * Embed a QUERY. F2LLM expects the `Instruct:`-wrapped prefix on query
+   * text only; documents (`embed`/`embedBatch`) get none. Keeping this on
+   * the embedder localizes the model-specific instruction string and keeps
+   * the asymmetry explicit + unit-testable.
+   */
+  async embedQuery(text: string): Promise<Float32Array> {
+    this.#ensureOpen();
+    return this.embed(buildQueryText(text));
+  }
+
   async embedBatch(texts: readonly string[]): Promise<readonly Float32Array[]> {
     this.#ensureOpen();
     if (texts.length === 0) {
@@ -263,13 +251,13 @@ class OnnxEmbedder implements Embedder {
     }
     if (batchMax === 0) {
       // Degenerate case: every input tokenized to zero tokens. Return zero
-      // vectors (still dim=768) so callers downstream get a stable shape.
+      // vectors (still dim=320) so callers downstream get a stable shape.
       return texts.map(() => new Float32Array(EMBED_DIM));
     }
 
     // Build flat [B, seqLen] buffers.
     const batchSize = encoded.length;
-    const flatIds = new BigInt64Array(batchSize * batchMax).fill(MODERNBERT_PAD_ID);
+    const flatIds = new BigInt64Array(batchSize * batchMax).fill(F2LLM_PAD_ID);
     const flatMask = new BigInt64Array(batchSize * batchMax);
     for (let b = 0; b < batchSize; b++) {
       const e = encoded[b];
@@ -285,29 +273,30 @@ class OnnxEmbedder implements Embedder {
       input_ids: new Tensor("int64", flatIds, dims),
       attention_mask: new Tensor("int64", flatMask, dims),
     };
-    const results = await this.#session.run(feeds, ["last_hidden_state"]);
-    const hidden = results["last_hidden_state"];
-    if (hidden === undefined || hidden.type !== "float32") {
+    // F2LLM's graph pools (last-token) + L2-normalizes internally and emits a
+    // single output named `embedding`, shape [B, EMBED_DIM] — already
+    // unit-length. We do NO JS-side pooling/normalization here.
+    const results = await this.#session.run(feeds, ["embedding"]);
+    const embedding = results["embedding"];
+    if (embedding === undefined || embedding.type !== "float32") {
       throw new Error(
-        `ONNX session did not return float32 last_hidden_state (got ${String(hidden?.type)})`,
+        `ONNX session did not return a float32 'embedding' tensor (got ${String(embedding?.type)})`,
       );
     }
-    // Shape is [B, seqLen, hiddenSize]. hiddenSize derived from data length
-    // so we don't hard-fail if a checkpoint ever surprises us with a
-    // different dim — we just assert it matches EMBED_DIM at the boundary.
-    const data = hidden.data as Float32Array;
-    const hiddenSize = data.length / (batchSize * batchMax);
-    if (hiddenSize !== EMBED_DIM) {
-      throw new Error(`Expected hidden size ${EMBED_DIM}, got ${hiddenSize}. Wrong model loaded?`);
+    // Shape is [B, EMBED_DIM] (NOT [B, seqLen, H]). Derive the per-row width
+    // from the flat buffer length and assert it matches EMBED_DIM at the
+    // boundary so a wrong model loaded surfaces loudly.
+    const data = embedding.data as Float32Array;
+    const rowDim = data.length / batchSize;
+    if (rowDim !== EMBED_DIM) {
+      throw new Error(`Expected embedding dim ${EMBED_DIM}, got ${rowDim}. Wrong model loaded?`);
     }
 
     const out: Float32Array[] = [];
     for (let b = 0; b < batchSize; b++) {
-      const vec = clsPool(data, b, batchMax, hiddenSize);
-      if (this.#normalize) {
-        l2NormalizeInPlace(vec);
-      }
-      out.push(vec);
+      // Copy each row out of the shared buffer so callers own an independent
+      // Float32Array (the graph already normalized it).
+      out.push(data.slice(b * EMBED_DIM, (b + 1) * EMBED_DIM));
     }
     return out;
   }
@@ -328,7 +317,7 @@ class OnnxEmbedder implements Embedder {
 }
 
 /**
- * Open a deterministic gte-modernbert-base embedder.
+ * Open a deterministic F2LLM-v2-80M embedder.
  *
  * Throws {@link EmbedderNotSetupError} if the weight files are not present —
  * callers in the CLI use this to surface `codehub setup --embeddings`
@@ -337,12 +326,11 @@ class OnnxEmbedder implements Embedder {
 export async function openOnnxEmbedder(cfg: EmbedderConfig = {}): Promise<Embedder> {
   const variant = cfg.variant ?? "fp32";
   const modelDir = resolveModelDir(cfg.modelDir, variant);
-  const normalize = cfg.normalize ?? true;
   // `maxSequenceLength` is the caller-facing budget in user tokens; the
-  // actual model input adds 2 slots for [CLS]/[SEP], capped at
-  // MODEL_MAX_POSITION (8192) to fit the position embedding table.
-  const userMax = cfg.maxSequenceLength ?? MODEL_MAX_POSITION - 2;
-  const maxModelLength = Math.min(userMax + 2, MODEL_MAX_POSITION);
+  // tokenizer appends a single EOS token, so the model input is at most
+  // userMax + 1, capped at MODEL_MAX_LENGTH.
+  const userMax = cfg.maxSequenceLength ?? MODEL_MAX_LENGTH - 1;
+  const maxModelLength = Math.min(userMax + 1, MODEL_MAX_LENGTH);
 
   const { modelPath, tokenizerDir } = await assertModelFiles(modelDir, variant);
 
@@ -387,7 +375,6 @@ export async function openOnnxEmbedder(cfg: EmbedderConfig = {}): Promise<Embedd
     session,
     tokenizer,
     variant,
-    normalize,
     maxModelLength,
     Tensor: ort.Tensor,
   });
diff --git a/packages/embedder/src/paths.test.ts b/packages/embedder/src/paths.test.ts
index 786e0aab..2b77c4cc 100644
--- a/packages/embedder/src/paths.test.ts
+++ b/packages/embedder/src/paths.test.ts
@@ -38,12 +38,12 @@ describe("paths", () => {
 
   it("resolveModelDir builds fp32 path by default", () => {
     const dir = resolveModelDir();
-    equal(dir, join(homedir(), ".codehub", "models", "gte-modernbert-base", "fp32"));
+    equal(dir, join(homedir(), ".codehub", "models", "f2llm-v2-80m", "fp32"));
   });
 
   it("resolveModelDir respects int8 variant", () => {
     const dir = resolveModelDir(undefined, "int8");
-    equal(dir, join(homedir(), ".codehub", "models", "gte-modernbert-base", "int8"));
+    equal(dir, join(homedir(), ".codehub", "models", "f2llm-v2-80m", "int8"));
   });
 
   it("resolveModelDir returns override unchanged when provided", () => {
@@ -59,11 +59,8 @@ describe("paths", () => {
     equal(modelFileName("int8"), "model_int8.onnx");
   });
 
-  it("TOKENIZER_FILES enumerates the four required JSON files", () => {
-    deepEqual(
-      [...TOKENIZER_FILES],
-      ["tokenizer.json", "tokenizer_config.json", "config.json", "special_tokens_map.json"],
-    );
-    ok(TOKENIZER_FILES.length === 4);
+  it("TOKENIZER_FILES enumerates the two required JSON files", () => {
+    deepEqual([...TOKENIZER_FILES], ["tokenizer.json", "tokenizer_config.json"]);
+    ok(TOKENIZER_FILES.length === 2);
   });
 });
diff --git a/packages/embedder/src/paths.ts b/packages/embedder/src/paths.ts
index dbd91105..9aebf786 100644
--- a/packages/embedder/src/paths.ts
+++ b/packages/embedder/src/paths.ts
@@ -1,13 +1,11 @@
 /**
- * Resolves the on-disk location of gte-modernbert-base weight files.
+ * Resolves the on-disk location of F2LLM-v2-80M weight files.
  *
  * Layout convention:
- *   ${CODEHUB_HOME:-~/.codehub}/models/gte-modernbert-base/${variant}/
+ *   ${CODEHUB_HOME:-~/.codehub}/models/f2llm-v2-80m/${variant}/
  *     ├── model.onnx          (or model_int8.onnx)
  *     ├── tokenizer.json
- *     ├── tokenizer_config.json
- *     ├── config.json
- *     └── special_tokens_map.json
+ *     └── tokenizer_config.json
  *
  * `codehub setup --embeddings` is the code path that populates this
  * directory; this module just resolves paths and never touches the network.
@@ -16,7 +14,7 @@
 import { homedir } from "node:os";
 import { join, resolve } from "node:path";
 
-const MODEL_SUBDIR = "models/gte-modernbert-base";
+const MODEL_SUBDIR = "models/f2llm-v2-80m";
 
 /**
  * Root directory that holds every OpenCodeHub-managed artefact (model weights,
@@ -49,10 +47,10 @@ export function modelFileName(variant: "fp32" | "int8"): string {
   return variant === "fp32" ? "model.onnx" : "model_int8.onnx";
 }
 
-/** All tokenizer-related files we require alongside the ONNX weights. */
-export const TOKENIZER_FILES = [
-  "tokenizer.json",
-  "tokenizer_config.json",
-  "config.json",
-  "special_tokens_map.json",
-] as const;
+/**
+ * All tokenizer-related files we require alongside the ONNX weights. The
+ * F2LLM ONNX export ships only these two — pooling + normalization are
+ * baked into the graph, so there is no separate `config.json` /
+ * `special_tokens_map.json` to fetch.
+ */
+export const TOKENIZER_FILES = ["tokenizer.json", "tokenizer_config.json"] as const;
diff --git a/packages/embedder/src/query-prefix.ts b/packages/embedder/src/query-prefix.ts
new file mode 100644
index 00000000..64be412f
--- /dev/null
+++ b/packages/embedder/src/query-prefix.ts
@@ -0,0 +1,25 @@
+/**
+ * F2LLM query-prefix helper.
+ *
+ * F2LLM-v2-80M is an asymmetric retrieval model: QUERY text is wrapped in an
+ * `Instruct: {instruction}\nQuery: {query}` template, while DOCUMENT text is
+ * embedded raw. Applying the prefix to documents (or omitting it on queries)
+ * degrades retrieval. This module is the single source of truth for the
+ * instruction string and the wrapping format so the query path
+ * (`embedQuery`) and any backend that prefixes caller-side stay in lockstep.
+ *
+ * The instruction string is the one validated in the POC ranking parity
+ * harness (`export/verify_ranking.py`).
+ */
+
+/** The retrieval instruction prepended to every query (F2LLM contract). */
+export const F2LLM_QUERY_INSTRUCTION =
+  "Given a code search query, retrieve the most relevant code snippet.";
+
+/**
+ * Wrap raw query text in the F2LLM `Instruct:`/`Query:` template. Documents
+ * must NOT be passed through this — embed them raw.
+ */
+export function buildQueryText(query: string): string {
+  return `Instruct: ${F2LLM_QUERY_INSTRUCTION}\nQuery: ${query}`;
+}
diff --git a/packages/embedder/src/sagemaker-embedder.integration.test.ts b/packages/embedder/src/sagemaker-embedder.integration.test.ts
index f72ef53d..4dc71090 100644
--- a/packages/embedder/src/sagemaker-embedder.integration.test.ts
+++ b/packages/embedder/src/sagemaker-embedder.integration.test.ts
@@ -11,7 +11,7 @@
  *   AWS_PROFILE=lalsaado-handson \
  *   AWS_REGION=us-east-1 \
  *   CODEHUB_INTEGRATION=1 \
- *   CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT=gte-modernbert-embed \
+ *   CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT=f2llm-embed \
  *   pnpm --filter @opencodehub/embedder test
  */
 
@@ -33,7 +33,7 @@ const skipReason = !INTEGRATION_GATE
 describe("openSagemakerEmbedder — live SageMaker endpoint", {
   skip: skipReason ?? undefined,
 }, () => {
-  it("single text returns a 768-d Float32Array with unit L2 norm (≈1.0)", async () => {
+  it("single text returns a 320-d Float32Array with unit L2 norm (≈1.0)", async () => {
     const embedder = await openSagemakerEmbedder({
       endpointName: ENDPOINT as string,
       region: REGION,
@@ -42,9 +42,9 @@ describe("openSagemakerEmbedder — live SageMaker endpoint", {
       const vec = await embedder.embed(
         "function add(a: number, b: number): number { return a + b; }",
       );
-      equal(vec.length, 768);
-      // TEI with the gte-modernbert-base bundled Normalize module returns
-      // L2-normalized vectors; assert norm is close to 1.
+      equal(vec.length, 320);
+      // F2LLM bakes last-token pooling + L2 normalization into its graph, so
+      // the endpoint returns L2-normalized vectors; assert norm is close to 1.
       let norm = 0;
       for (let i = 0; i < vec.length; i++) {
         const v = vec[i] ?? 0;
@@ -66,7 +66,7 @@ describe("openSagemakerEmbedder — live SageMaker endpoint", {
       const texts = Array.from({ length: 64 }, (_, i) => `const value${i} = ${i};`);
       const out = await embedder.embedBatch(texts);
       equal(out.length, 64);
-      for (const v of out) equal(v.length, 768);
+      for (const v of out) equal(v.length, 320);
     } finally {
       await embedder.close();
     }
@@ -81,7 +81,7 @@ describe("openSagemakerEmbedder — live SageMaker endpoint", {
       const texts = Array.from({ length: 100 }, (_, i) => `let x${i} = ${i};`);
       const out = await embedder.embedBatch(texts);
       equal(out.length, 100);
-      for (const v of out) equal(v.length, 768);
+      for (const v of out) equal(v.length, 320);
     } finally {
       await embedder.close();
     }
diff --git a/packages/embedder/src/sagemaker-embedder.parity.test.ts b/packages/embedder/src/sagemaker-embedder.parity.test.ts
index 89644e58..85cf7f98 100644
--- a/packages/embedder/src/sagemaker-embedder.parity.test.ts
+++ b/packages/embedder/src/sagemaker-embedder.parity.test.ts
@@ -9,9 +9,12 @@
  *     `CODEHUB_HOME`. Weight-missing is detected lazily — `openOnnxEmbedder`
  *     throws `EmbedderNotSetupError` and we skip the rest of the suite.
  *
- * Acceptance threshold: per-pair cosine similarity ≥ 0.99. Both backends
- * use CLS pooling + L2 normalization, so cosine should be ≳ 0.999 on the
- * happy path — the 0.99 floor absorbs fp16-vs-fp32 drift on the GPU side.
+ * Acceptance threshold: per-pair cosine similarity ≥ 0.99. Both backends are
+ * F2LLM-v2-80M (last-token pooling + L2 normalization baked into the graph),
+ * so cosine should be ≳ 0.999 on the happy path — the 0.99 floor absorbs
+ * fp16-vs-fp32 drift on the GPU side. The SageMaker endpoint pointed at by
+ * `CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT` must serve the same F2LLM model for
+ * the parity assertion to hold.
  */
 
 import { ok } from "node:assert/strict";
diff --git a/packages/embedder/src/sagemaker-embedder.test.ts b/packages/embedder/src/sagemaker-embedder.test.ts
index a5098cc2..3bb0693d 100644
--- a/packages/embedder/src/sagemaker-embedder.test.ts
+++ b/packages/embedder/src/sagemaker-embedder.test.ts
@@ -2,7 +2,7 @@
  * Tests for the SageMaker embedder backend.
  *
  * Coverage:
- *   - happy path: single input + small batch returns 768-d Float32Array
+ *   - happy path: single input + small batch returns 320-d Float32Array
  *   - large batch (>64) splits into multiple InvokeEndpointCommand calls
  *   - dim mismatch throws with clear message
  *   - row-count mismatch (endpoint returned fewer rows than inputs) throws
@@ -96,10 +96,10 @@ describe("readSagemakerEmbedderConfigFromEnv", () => {
   });
 
   it("reads the endpoint name when set", () => {
-    process.env["CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT"] = "gte-modernbert-embed";
+    process.env["CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT"] = "f2llm-embed";
     const cfg = readSagemakerEmbedderConfigFromEnv();
     ok(cfg !== null);
-    equal(cfg.endpointName, "gte-modernbert-embed");
+    equal(cfg.endpointName, "f2llm-embed");
     equal(cfg.region, undefined); // default applied at factory
   });
 
@@ -130,8 +130,8 @@ describe("readSagemakerEmbedderConfigFromEnv", () => {
 });
 
 describe("openSagemakerEmbedder — happy path", () => {
-  it("embeds a single text and returns a 768-d Float32Array", async () => {
-    const row = vec(768, 0.1);
+  it("embeds a single text and returns a 320-d Float32Array", async () => {
+    const row = vec(320, 0.1);
     const { runtime, calls, lastBatch } = makeRuntime(() => [row]);
 
     const embedder = await openSagemakerEmbedder({
@@ -140,7 +140,7 @@ describe("openSagemakerEmbedder — happy path", () => {
     });
 
     const out = await embedder.embed("hello");
-    equal(out.length, 768);
+    equal(out.length, 320);
     equal(out[0], Math.fround(0.1));
     equal(calls(), 1);
     equal(lastBatch(), 1);
@@ -148,18 +148,18 @@ describe("openSagemakerEmbedder — happy path", () => {
   });
 
   it("reports modelId with endpoint-name stamp by default", async () => {
-    const { runtime } = makeRuntime(() => [vec(768, 0)]);
+    const { runtime } = makeRuntime(() => [vec(320, 0)]);
     const embedder = await openSagemakerEmbedder({
-      endpointName: "gte-modernbert-embed",
+      endpointName: "f2llm-embed",
       runtime,
     });
-    equal(embedder.dim, 768);
-    match(embedder.modelId, /^gte-modernbert-base\/sagemaker:gte-modernbert-embed$/);
+    equal(embedder.dim, 320);
+    match(embedder.modelId, /^f2llm-v2-80m\/sagemaker:f2llm-embed$/);
     await embedder.close();
   });
 
   it("honors an explicit modelId override", async () => {
-    const { runtime } = makeRuntime(() => [vec(768, 0)]);
+    const { runtime } = makeRuntime(() => [vec(320, 0)]);
     const embedder = await openSagemakerEmbedder({
       endpointName: "anything",
       modelId: "custom/model:v1",
@@ -171,7 +171,7 @@ describe("openSagemakerEmbedder — happy path", () => {
 
   it("batches ≤64 inputs in a single InvokeEndpoint call", async () => {
     const { runtime, calls, lastBatch } = makeRuntime((n) =>
-      Array.from({ length: n }, (_, i) => vec(768, i * 0.01)),
+      Array.from({ length: n }, (_, i) => vec(320, i * 0.01)),
     );
 
     const embedder = await openSagemakerEmbedder({
@@ -196,7 +196,7 @@ describe("openSagemakerEmbedder — happy path", () => {
         };
         sizes.push(parsed.inputs.length);
         return {
-          Body: responseBody(parsed.inputs.map((_, i) => vec(768, i * 0.001))),
+          Body: responseBody(parsed.inputs.map((_, i) => vec(320, i * 0.001))),
         };
       },
     };
@@ -215,7 +215,7 @@ describe("openSagemakerEmbedder — happy path", () => {
   });
 
   it("returns an empty array for an empty batch without calling the endpoint", async () => {
-    const { runtime, calls } = makeRuntime(() => [vec(768, 0)]);
+    const { runtime, calls } = makeRuntime(() => [vec(320, 0)]);
     const embedder = await openSagemakerEmbedder({
       endpointName: "test-endpoint",
       runtime,
@@ -234,14 +234,14 @@ describe("openSagemakerEmbedder — error cases", () => {
       endpointName: "test-endpoint",
       runtime,
     });
-    await rejects(embedder.embed("hello"), /512d vector at row 0, expected 768d/);
+    await rejects(embedder.embed("hello"), /512d vector at row 0, expected 320d/);
   });
 
   it("throws on row-count mismatch (endpoint returned too few rows)", async () => {
     const runtime: SagemakerRuntimeLike = {
       async send(_command: SendCmd) {
         // Return 1 row for any number of inputs.
-        return { Body: responseBody([vec(768, 0)]) };
+        return { Body: responseBody([vec(320, 0)]) };
       },
     };
     const embedder = await openSagemakerEmbedder({
@@ -290,7 +290,7 @@ describe("openSagemakerEmbedder — error cases", () => {
           (err as { name: string }).name = "ValidationException";
           throw err;
         }
-        return { Body: responseBody([vec(768, parsed.inputs[0] === "a" ? 0.1 : 0.2)]) };
+        return { Body: responseBody([vec(320, parsed.inputs[0] === "a" ? 0.1 : 0.2)]) };
       },
     };
     const embedder = await openSagemakerEmbedder({
@@ -348,7 +348,7 @@ describe("tryOpenHttpEmbedder — SageMaker precedence", () => {
   });
 
   it("throws when offline AND SageMaker env is configured", () => {
-    process.env["CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT"] = "gte-modernbert-embed";
+    process.env["CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT"] = "f2llm-embed";
     throws(
       () => tryOpenHttpEmbedder({ offline: true }),
       /SageMaker embeddings are disabled in offline mode/,
diff --git a/packages/embedder/src/sagemaker-embedder.ts b/packages/embedder/src/sagemaker-embedder.ts
index 7e3dddfc..1b8c5744 100644
--- a/packages/embedder/src/sagemaker-embedder.ts
+++ b/packages/embedder/src/sagemaker-embedder.ts
@@ -1,8 +1,9 @@
 /**
  * SageMaker embedder backend. Invokes a TEI (Text Embeddings Inference)
- * SageMaker endpoint — e.g. the `embed-serve` stack at
- * `/efs/lalsaado/workplace/embed-serve/` which serves
- * `Alibaba-NLP/gte-modernbert-base` as `gte-modernbert-embed` in us-east-1.
+ * SageMaker endpoint. NOTE: the local F2LLM query-prefix asymmetry is NOT
+ * applied here — a remote endpoint must handle query/document pooling +
+ * prefixing server-side. The default `dims` (320) matches F2LLM, but the
+ * caller's endpoint determines the actual model.
  *
  * Selection: {@link readSagemakerEmbedderConfigFromEnv} returns a config
  * when `CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT` is set; otherwise `null` so
@@ -24,14 +25,14 @@
  *   - SDK retry (`maxAttempts: 5`) handles throttling + 5xx.
  *   - Dims asserted on every response so a remote model swap cannot
  *     silently pollute downstream HNSW indexes.
- *   - `modelId` is stamped as `gte-modernbert-base/sagemaker:<endpoint>`
+ *   - `modelId` is stamped as `f2llm-v2-80m/sagemaker:<endpoint>`
  *     so an index built with this backend is visibly distinct from a
  *     local ONNX index.
  */
 
 import { type Embedder, EmbedderNotSetupError } from "./types.js";
 
-const DEFAULT_DIMS = 768;
+const DEFAULT_DIMS = 320;
 const DEFAULT_REGION = "us-east-1";
 const MAX_BATCH = 64;
 const DEFAULT_MAX_ATTEMPTS = 5;
@@ -55,17 +56,17 @@ export interface SagemakerRuntimeLike {
 
 /** Configuration for {@link openSagemakerEmbedder}. */
 export interface SagemakerEmbedderConfig {
-  /** Name of the SageMaker endpoint (e.g. `gte-modernbert-embed`). */
+  /** Name of the SageMaker endpoint (e.g. `f2llm-embed`). */
   readonly endpointName: string;
   /** AWS region of the endpoint. Defaults to `us-east-1`. */
   readonly region?: string;
   /**
    * Stable model id reported to the index layer. Defaults to
-   * `gte-modernbert-base/sagemaker:<endpointName>` so index metadata
+   * `f2llm-v2-80m/sagemaker:<endpointName>` so index metadata
    * distinguishes this backend from local ONNX.
    */
   readonly modelId?: string;
-  /** Expected response-vector dimension. Defaults to 768. */
+  /** Expected response-vector dimension. Defaults to 320. */
   readonly dims?: number;
   /** SDK `maxAttempts`. Defaults to 5. */
   readonly maxAttempts?: number;
@@ -162,7 +163,7 @@ export async function openSagemakerEmbedder(cfg: SagemakerEmbedderConfig): Promi
   const region = cfg.region ?? DEFAULT_REGION;
   const dims = cfg.dims ?? DEFAULT_DIMS;
   const endpointName = cfg.endpointName;
-  const modelId = cfg.modelId ?? `gte-modernbert-base/sagemaker:${endpointName}`;
+  const modelId = cfg.modelId ?? `f2llm-v2-80m/sagemaker:${endpointName}`;
   const maxAttempts = cfg.maxAttempts ?? DEFAULT_MAX_ATTEMPTS;
 
   let runtime: SagemakerRuntimeLike;
@@ -312,6 +313,8 @@ export async function openSagemakerEmbedder(cfg: SagemakerEmbedderConfig): Promi
     dim: dims,
     modelId,
     embed: embedOne,
+    // Remote endpoint owns pooling/prefix server-side; alias query→document.
+    embedQuery: embedOne,
     embedBatch,
     async close(): Promise<void> {
       // SageMakerRuntimeClient keeps an HTTP agent alive; destroy when
diff --git a/packages/embedder/src/types.ts b/packages/embedder/src/types.ts
index 6e084ebc..f96cfb4b 100644
--- a/packages/embedder/src/types.ts
+++ b/packages/embedder/src/types.ts
@@ -1,8 +1,8 @@
 /**
  * Public types for the @opencodehub/embedder package.
  *
- * The embedder turns a piece of text into a deterministic 768-dim Float32Array
- * using the gte-modernbert-base ONNX model. Callers in @opencodehub/search and
+ * The embedder turns a piece of text into a deterministic 320-dim Float32Array
+ * using the F2LLM-v2-80M ONNX model. Callers in @opencodehub/search and
  * @opencodehub/mcp consume this via the `Embedder` interface; the concrete
  * implementation is opened with `openOnnxEmbedder`.
  */
@@ -13,17 +13,26 @@
  * contract the graphHash CI gate relies on.
  */
 export interface Embedder {
-  /** Output dimension. Always 768 for gte-modernbert-base. */
+  /** Output dimension. 320 for F2LLM-v2-80M (the local ONNX backend). */
   readonly dim: number;
   /**
-   * Stable model identifier, e.g. `gte-modernbert-base/fp32`. Used by
-   * the storage layer to tag `embeddings.model` so incompatible vectors are
-   * never mixed in the same HNSW index.
+   * Stable model identifier, e.g. `f2llm-v2-80m/fp32`. Used by the storage
+   * layer to tag `embeddings.model` so incompatible vectors are never mixed
+   * in the same index.
    */
   readonly modelId: string;
-  /** Embed a single text. */
+  /**
+   * Embed a single DOCUMENT (no query prefix). Use {@link embedQuery} for
+   * search queries.
+   */
   embed(text: string): Promise<Float32Array>;
-  /** Embed a batch of texts. Returned array matches the input order 1:1. */
+  /**
+   * Embed a single QUERY. For asymmetric models (F2LLM) this applies the
+   * model's query instruction prefix; documents are embedded raw via
+   * {@link embed}. Symmetric backends may alias this to {@link embed}.
+   */
+  embedQuery(text: string): Promise<Float32Array>;
+  /** Embed a batch of DOCUMENTS. Returned array matches the input order 1:1. */
   embedBatch(texts: readonly string[]): Promise<readonly Float32Array[]>;
   /** Release native session + tokenizer resources. Idempotent. */
   close(): Promise<void>;
@@ -36,21 +45,18 @@ export interface Embedder {
  */
 export interface EmbedderConfig {
   /**
-   * Directory containing `model.onnx` (or `model_int8.onnx`) and the four
+   * Directory containing `model.onnx` (or `model_int8.onnx`) and the two
    * tokenizer JSON files. Defaults to
-   * `${CODEHUB_HOME:-~/.codehub}/models/gte-modernbert-base/${variant}/`.
+   * `${CODEHUB_HOME:-~/.codehub}/models/f2llm-v2-80m/${variant}/`.
    */
   readonly modelDir?: string;
   /** Which ONNX weight file to load. Defaults to `fp32`. */
   readonly variant?: "fp32" | "int8";
   /**
-   * Max tokens of the user-supplied text, before `[CLS]`/`[SEP]` are added.
-   * Defaults to 8190 so the full sequence fits in ModernBERT's 8192-token
-   * position embedding table.
+   * Max tokens of the user-supplied text, before the EOS token is appended.
+   * Defaults to 8191 so the full sequence fits the 8192-token operative cap.
    */
   readonly maxSequenceLength?: number;
-  /** L2-normalize the output vector. Defaults to `true`. */
-  readonly normalize?: boolean;
   /**
    * Directory containing the onnxruntime-web `.wasm` artifacts
    * (`ort-wasm-simd-threaded.*.wasm`). Sets `ort.env.wasm.wasmPaths`. When
diff --git a/packages/ingestion/CHANGELOG.md b/packages/ingestion/CHANGELOG.md
index c1a2dea7..9b9f79f8 100644
--- a/packages/ingestion/CHANGELOG.md
+++ b/packages/ingestion/CHANGELOG.md
@@ -1,5 +1,12 @@
 # Changelog
 
+## [0.6.0](https://github.com/theagenticguy/opencodehub/compare/ingestion-v0.5.0...ingestion-v0.6.0) (2026-06-26)
+
+
+### ⚠ BREAKING CHANGES
+
+* **embedder:** local embedding model swapped to `codefuse-ai/F2LLM-v2-80M` (320-dim, was gte-modernbert-base 768-dim). The analyze path now suppresses the content-hash cache on a model change so all symbols re-embed (no mixed-dim store); existing stores must be rebuilt with `codehub analyze --embeddings`.
+
 ## [0.5.0](https://github.com/theagenticguy/opencodehub/compare/ingestion-v0.4.5...ingestion-v0.5.0) (2026-06-01)
 
 
diff --git a/packages/ingestion/src/pipeline/phases/embedder-pool.ts b/packages/ingestion/src/pipeline/phases/embedder-pool.ts
index c02ca4ac..8477e6e7 100644
--- a/packages/ingestion/src/pipeline/phases/embedder-pool.ts
+++ b/packages/ingestion/src/pipeline/phases/embedder-pool.ts
@@ -53,7 +53,7 @@ export function openOnnxEmbedderPool(opts: EmbedderPoolOptions): Embedder {
   });
 
   let closed = false;
-  const dim = 768; // gte-modernbert-base — matches OnnxEmbedder's EMBED_DIM.
+  const dim = 320; // F2LLM-v2-80M — matches OnnxEmbedder's EMBED_DIM.
 
   async function embedBatch(texts: readonly string[]): Promise<readonly Float32Array[]> {
     if (closed) throw new Error("Embedder pool is closed");
@@ -78,6 +78,13 @@ export function openOnnxEmbedderPool(opts: EmbedderPoolOptions): Embedder {
       if (vec === undefined) throw new Error("embedBatch returned empty result");
       return vec;
     },
+    // Ingestion only embeds documents; the pool never embeds queries. Alias
+    // to embed() to satisfy the Embedder interface (no query prefix applied).
+    async embedQuery(text: string): Promise<Float32Array> {
+      const [vec] = await embedBatch([text]);
+      if (vec === undefined) throw new Error("embedBatch returned empty result");
+      return vec;
+    },
     embedBatch,
     async close(): Promise<void> {
       if (closed) return;
diff --git a/packages/ingestion/src/pipeline/phases/embeddings.test.ts b/packages/ingestion/src/pipeline/phases/embeddings.test.ts
index 3766dc75..10c90005 100644
--- a/packages/ingestion/src/pipeline/phases/embeddings.test.ts
+++ b/packages/ingestion/src/pipeline/phases/embeddings.test.ts
@@ -157,7 +157,7 @@ describe("embeddingsPhase", () => {
 // across runs produce identical embeddings.
 // ---------------------------------------------------------------------------
 
-const HTTP_DIM = 768;
+const HTTP_DIM = 320;
 
 /**
  * Hash-derived deterministic embedding. Stable across runs given the same
diff --git a/packages/ingestion/src/pipeline/phases/embeddings.ts b/packages/ingestion/src/pipeline/phases/embeddings.ts
index 91a15778..df5cfc72 100644
--- a/packages/ingestion/src/pipeline/phases/embeddings.ts
+++ b/packages/ingestion/src/pipeline/phases/embeddings.ts
@@ -1,5 +1,5 @@
 /**
- * Embeddings phase — generates 768-dim vectors across one or more
+ * Embeddings phase — generates 320-dim vectors across one or more
  * hierarchical tiers and materialises them into the phase output as an
  * array of `EmbeddingRow`s the CLI upserts into the SQLite store.
  *
@@ -188,7 +188,7 @@ export interface EmbedderPhaseOutput {
   readonly chunksTotal: number;
   /**
    * Stable id tag for the embedder that produced these rows — e.g.
-   * `gte-modernbert-base/fp32`. Empty string when the phase was a
+   * `f2llm-v2-80m/fp32`. Empty string when the phase was a
    * no-op (flag off or weights missing).
    */
   readonly embeddingsModelId: string;
@@ -580,8 +580,9 @@ async function runEmbeddings(ctx: PipelineContext): Promise<EmbedderPhaseOutput>
     const priorHashes: Map<string, string> =
       forceFlag || hashCache === undefined ? new Map() : await hashCache.list();
 
-    // Max tokens includes [CLS]/[SEP]; the embedder caps input at 510 user
-    // tokens by default. Keep the chunker slightly conservative.
+    // Per-chunk token budget. F2LLM accepts up to 8192 tokens, but smaller
+    // chunks keep last-token-pooled vectors focused on a single unit of
+    // meaning; 500 mirrors the long-standing chunking granularity.
     const maxUserTokens = 500;
 
     // Lookup summaries by nodeId (the newest `createdAt` wins when multiple
diff --git a/packages/mcp/CHANGELOG.md b/packages/mcp/CHANGELOG.md
index 1825af10..d287b30f 100644
--- a/packages/mcp/CHANGELOG.md
+++ b/packages/mcp/CHANGELOG.md
@@ -1,5 +1,12 @@
 # Changelog
 
+## [0.6.0](https://github.com/theagenticguy/opencodehub/compare/mcp-v0.5.0...mcp-v0.6.0) (2026-06-26)
+
+
+### ⚠ BREAKING CHANGES
+
+* **embedder:** local embedding model swapped to `codefuse-ai/F2LLM-v2-80M` (320-dim, was gte-modernbert-base 768-dim). Existing stores must be rebuilt with `codehub analyze --embeddings`; queries against a stale-dim store are refused by the fingerprint guard.
+
 ## [0.5.0](https://github.com/theagenticguy/opencodehub/compare/mcp-v0.4.5...mcp-v0.5.0) (2026-06-01)
 
 
diff --git a/packages/mcp/src/server.ts b/packages/mcp/src/server.ts
index 09f59baa..450f79ab 100644
--- a/packages/mcp/src/server.ts
+++ b/packages/mcp/src/server.ts
@@ -84,7 +84,7 @@ export interface StartServerOptions {
 }
 
 /**
- * Probe for gte-modernbert-base weights on disk. Runs once at server startup
+ * Probe for F2LLM-v2-80M weights on disk. Runs once at server startup
  * and logs a single structured warning when the weights are absent so
  * agents see the BM25-only fallback reason. Never throws: a missing or
  * unreadable model directory is a supported deployment mode.
@@ -105,7 +105,7 @@ async function probeEmbedderWeights(silent: boolean): Promise<void> {
     }
     const root = getDefaultModelRoot();
     console.warn(
-      `[mcp] hybrid: embeddings weights not found at ${root}/models/gte-modernbert-base/; run \`codehub setup --embeddings\`. Falling back to BM25-only.`,
+      `[mcp] hybrid: embeddings weights not found at ${root}/models/f2llm-v2-80m/; run \`codehub setup --embeddings\`. Falling back to BM25-only.`,
     );
   } catch (err) {
     // Probe failure is non-fatal; surface the reason but keep going.
diff --git a/packages/mcp/src/tools/query.test.ts b/packages/mcp/src/tools/query.test.ts
index 139cf6f8..ca8de4e1 100644
--- a/packages/mcp/src/tools/query.test.ts
+++ b/packages/mcp/src/tools/query.test.ts
@@ -387,6 +387,11 @@ class FakeEmbedder implements Embedder {
   async embed(_text: string): Promise<Float32Array> {
     return new Float32Array([0.1, 0.2, 0.3, 0.4]);
   }
+  // F2LLM gained a query-only `embedQuery` path; the fake aliases it to
+  // `embed` since the query tool only needs a stable Float32Array back.
+  async embedQuery(text: string): Promise<Float32Array> {
+    return this.embed(text);
+  }
   async embedBatch(texts: readonly string[]): Promise<readonly Float32Array[]> {
     return texts.map(() => new Float32Array([0.1, 0.2, 0.3, 0.4]));
   }
@@ -576,9 +581,7 @@ test("query: populated embeddings + EMBEDDER_NOT_SETUP → warn + BM25 fallback"
   };
   try {
     const opener: EmbedderFactory = async () => {
-      const err = new Error(
-        "gte-modernbert-base weights not found. Run `codehub setup --embeddings`.",
-      );
+      const err = new Error("F2LLM-v2-80M weights not found. Run `codehub setup --embeddings`.");
       // Shape matches EmbedderNotSetupError.code.
       (err as unknown as { code: string }).code = "EMBEDDER_NOT_SETUP";
       throw err;
diff --git a/packages/mcp/src/tools/query.ts b/packages/mcp/src/tools/query.ts
index 86b5ee7c..4d9f6cda 100644
--- a/packages/mcp/src/tools/query.ts
+++ b/packages/mcp/src/tools/query.ts
@@ -7,8 +7,9 @@
  *      corpus extends transparently (see {@link bm25CorpusHasSummaries}) so
  *      summarized prose participates as soon as the ingestion phase lands.
  *   2. HNSW vector search over the `embeddings` table. The query text is
- *      embedded with the same gte-modernbert-base ONNX model the ingestion
- *      pipeline uses, so the vectors live in the same space.
+ *      embedded with the same F2LLM-v2-80M ONNX model the ingestion
+ *      pipeline uses, so the vectors live in the same space. (Queries get
+ *      the F2LLM `Instruct:` prefix; documents are embedded raw.)
  *
  * Graceful fallback:
  *   - If the `embeddings` table is empty, skip the vector leg entirely.
@@ -814,7 +815,7 @@ export function registerQueryTool(server: McpServer, ctx: ToolContext): void {
       description: [
         "True hybrid retrieval over the indexed code graph: BM25 keyword search",
         "(over symbol name + signature + description) fused with HNSW vector",
-        "search (gte-modernbert-base, 768-dim) via Reciprocal Rank Fusion (k=60).",
+        "search (F2LLM-v2-80M, 320-dim) via Reciprocal Rank Fusion (k=60).",
         "Each result carries `rank`, `nodeId`, `name`, `kind`, `filePath`,",
         "`startLine`/`endLine`, a capped `snippet` (~200 chars), the fused",
         "`score`, and `sources` indicating which ranker(s) contributed (`bm25`",
diff --git a/packages/mcp/src/tools/shared.ts b/packages/mcp/src/tools/shared.ts
index d061b603..5554a895 100644
--- a/packages/mcp/src/tools/shared.ts
+++ b/packages/mcp/src/tools/shared.ts
@@ -21,7 +21,7 @@ import { RepoResolveError, type ResolvedRepo, resolveRepo } from "../repo-resolv
 /**
  * Factory for opening an embedder on demand. The default factory imports
  * `@opencodehub/embedder` and calls `openOnnxEmbedder()`; tests inject a
- * fake so they don't need gte-modernbert-base weights on disk. The factory
+ * fake so they don't need F2LLM-v2-80M weights on disk. The factory
  * must throw on failure — the `query` tool treats any throw as
  * "embedder unavailable, warn + fall back to BM25".
  */
diff --git a/packages/search/CHANGELOG.md b/packages/search/CHANGELOG.md
index a77407a5..9b850475 100644
--- a/packages/search/CHANGELOG.md
+++ b/packages/search/CHANGELOG.md
@@ -1,5 +1,12 @@
 # Changelog
 
+## [0.4.0](https://github.com/theagenticguy/opencodehub/compare/search-v0.3.0...search-v0.4.0) (2026-06-26)
+
+
+### ⚠ BREAKING CHANGES
+
+* **embedder:** local embedding model swapped to `codefuse-ai/F2LLM-v2-80M` (320-dim, was gte-modernbert-base 768-dim). Hybrid search now embeds queries through `embedQuery()` (Instruct/Query prefix) while documents stay raw; existing stores must be rebuilt with `codehub analyze --embeddings`.
+
 ## [0.3.0](https://github.com/theagenticguy/opencodehub/compare/search-v0.2.3...search-v0.3.0) (2026-06-01)
 
 
diff --git a/packages/search/src/embedder.ts b/packages/search/src/embedder.ts
index 5c77c3f3..e777e82d 100644
--- a/packages/search/src/embedder.ts
+++ b/packages/search/src/embedder.ts
@@ -13,7 +13,7 @@
 
 import type { Embedder } from "./types.js";
 
-export const DEFAULT_EMBEDDER_DIM = 768;
+export const DEFAULT_EMBEDDER_DIM = 320;
 
 /** Whether the deprecation warning has already fired in this process. */
 let warnedOnce = false;
@@ -44,4 +44,9 @@ export class NullEmbedder implements Embedder {
     }
     return new Float32Array(this.dim);
   }
+
+  /** Query path mirrors {@link embed} — the stand-in has no model prefix. */
+  async embedQuery(text: string): Promise<Float32Array> {
+    return this.embed(text);
+  }
 }
diff --git a/packages/search/src/hybrid.test.ts b/packages/search/src/hybrid.test.ts
index 409e621a..38dc7377 100644
--- a/packages/search/src/hybrid.test.ts
+++ b/packages/search/src/hybrid.test.ts
@@ -149,6 +149,11 @@ class FakeEmbedder implements Embedder {
   async embed(): Promise<Float32Array> {
     return new Float32Array([0.1, 0.2, 0.3, 0.4]);
   }
+  // Symmetric stand-in: the query path mirrors the document path (no prefix).
+  // The fake ignores its input, so delegate to the no-arg `embed`.
+  async embedQuery(_text: string): Promise<Float32Array> {
+    return this.embed();
+  }
 }
 
 describe("hybridSearch", () => {
diff --git a/packages/search/src/hybrid.ts b/packages/search/src/hybrid.ts
index 26911d6c..18c6a5a0 100644
--- a/packages/search/src/hybrid.ts
+++ b/packages/search/src/hybrid.ts
@@ -79,7 +79,9 @@ export async function hybridSearch(
     }));
   }
 
-  const vector = await embedder.embed(q.text);
+  // Query text — embed via embedQuery so asymmetric models (F2LLM) apply
+  // their `Instruct:` query prefix. Documents were indexed raw via embed().
+  const vector = await embedder.embedQuery(q.text);
 
   let annHits: readonly { readonly nodeId: string; readonly distance: number }[];
   if (q.mode === "zoom") {
diff --git a/packages/search/src/types.ts b/packages/search/src/types.ts
index 091d78fd..54980ec3 100644
--- a/packages/search/src/types.ts
+++ b/packages/search/src/types.ts
@@ -66,6 +66,12 @@ export interface FusedHit {
  * zero-vector in production and throws in tests.
  */
 export interface Embedder {
+  /** Embed a DOCUMENT (no query prefix). */
   embed(text: string): Promise<Float32Array>;
+  /**
+   * Embed a QUERY. For asymmetric models (F2LLM) this applies the model's
+   * query instruction prefix; documents are embedded raw via {@link embed}.
+   */
+  embedQuery(text: string): Promise<Float32Array>;
   readonly dim: number;
 }
diff --git a/packages/storage/CHANGELOG.md b/packages/storage/CHANGELOG.md
index 451675ce..33be7fd1 100644
--- a/packages/storage/CHANGELOG.md
+++ b/packages/storage/CHANGELOG.md
@@ -1,5 +1,12 @@
 # Changelog
 
+## [0.4.0](https://github.com/theagenticguy/opencodehub/compare/storage-v0.3.0...storage-v0.4.0) (2026-06-26)
+
+
+### ⚠ BREAKING CHANGES
+
+* **embedder:** embedding dimension changed to 320 (`codefuse-ai/F2LLM-v2-80M`, was gte-modernbert-base 768-dim). The `embeddingDim` store option defaults to 320; existing stores must be rebuilt with `codehub analyze --embeddings`.
+
 ## [0.3.0](https://github.com/theagenticguy/opencodehub/compare/storage-v0.2.3...storage-v0.3.0) (2026-06-01)
 
 
diff --git a/packages/storage/src/sqlite-adapter.test.ts b/packages/storage/src/sqlite-adapter.test.ts
index 795d9345..9d9d6e0c 100644
--- a/packages/storage/src/sqlite-adapter.test.ts
+++ b/packages/storage/src/sqlite-adapter.test.ts
@@ -74,7 +74,7 @@ test("SqliteStore: graph + embeddings round-trip from ONE file across reopen", a
   try {
     const { graph, ids } = fixtureGraph();
 
-    // ── Write phase ── (8-dim embeddings for a readable test; real default 768)
+    // ── Write phase ── (8-dim embeddings for a readable test; real default 320)
     const w = new SqliteStore(dbPath, { embeddingDim: 8 });
     await w.open();
     await w.createSchema();
@@ -82,7 +82,7 @@ test("SqliteStore: graph + embeddings round-trip from ONE file across reopen", a
     assert.equal(stats.nodeCount, 5, "5 nodes loaded");
     assert.equal(stats.edgeCount, 3, "3 edges loaded");
 
-    // 8-dim embeddings so the test is readable; real default is 768.
+    // 8-dim embeddings so the test is readable; real default is 320.
     const vec = (seed: number): Float32Array =>
       Float32Array.from({ length: 8 }, (_, i) => Math.sin(seed + i));
     await w.upsertEmbeddings([
diff --git a/packages/storage/src/sqlite-adapter.ts b/packages/storage/src/sqlite-adapter.ts
index c144a814..17f7731b 100644
--- a/packages/storage/src/sqlite-adapter.ts
+++ b/packages/storage/src/sqlite-adapter.ts
@@ -89,7 +89,7 @@ import { assertReadOnlySql } from "./sql-guard.js";
 export interface SqliteStoreOptions {
   /** Open the file read-only. Query commands pass true; ingestion false. */
   readonly readOnly?: boolean;
-  /** Embedding dimensionality. Defaults to 768 (Bedrock Titan / Cohere tier). */
+  /** Embedding dimensionality. Defaults to 320 (F2LLM-v2-80M, the local ONNX tier). */
   readonly embeddingDim?: number;
   /**
    * Journal mode. Defaults to WAL — the whole point of the spike. Overridable
@@ -100,7 +100,7 @@ export interface SqliteStoreOptions {
   readonly timeoutMs?: number;
 }
 
-const DEFAULT_DIM = 768;
+const DEFAULT_DIM = 320;
 const SCHEMA_VERSION = "spike-sqlite-1";
 const DEFAULT_TIMEOUT_MS = 5_000;
 const DEFAULT_COCHANGE_LOOKUP_LIMIT = 10;
diff --git a/plugins/opencodehub/skills/codehub-guide/SKILL.md b/plugins/opencodehub/skills/codehub-guide/SKILL.md
index e29f4cef..e35df607 100644
--- a/plugins/opencodehub/skills/codehub-guide/SKILL.md
+++ b/plugins/opencodehub/skills/codehub-guide/SKILL.md
@@ -15,7 +15,7 @@ For any task that touches code understanding, debugging, impact analysis, refact
 2. Read `codehub://repo/{name}/context` — codebase stats and a staleness envelope.
 3. Match the task to a skill below and follow that skill's checklist.
 
-> If the context envelope reports the index is stale, run `codehub analyze` in the terminal first. If it says weights are missing, run `codehub setup --embeddings` to fetch the 768d gte-modernbert-base ONNX weights.
+> If the context envelope reports the index is stale, run `codehub analyze` in the terminal first. If it says weights are missing, run `codehub setup --embeddings` to fetch the 320d F2LLM-v2-80M ONNX weights.
 
 ## Skills · analysis
 

From cacf1bdd4e84702a2884591007db095f7f64fe0e Mon Sep 17 00:00:00 2001
From: Laith Al-Saadoon <alsaadoonlaith@gmail.com>
Date: Fri, 26 Jun 2026 01:48:26 +0000
Subject: [PATCH 2/2] ci(release): gate release pipeline to version-shaped tags
 only
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The `release: published` event fires for ANY GitHub release, including
non-package releases that merely host assets — e.g. the `embed-v1`
embedder-weights release this PR introduces (model-pins.ts pins its
URLs). Creating that release triggered the full build → sign →
npm-publish pipeline, which would have published every package to npm
(OCH_NPM_PUBLISH_ENABLED is true). It was cancelled in time and built
from main (not this branch), so nothing leaked, but the trigger must be
filtered.

Gate the `resolve` job (root of the chain; everything else `needs` it
under a success() gate) to version-shaped tags only: `root-v*`, `cli-v*`,
or bare `v*` (the release-please conventions). `workflow_call` /
`workflow_dispatch` pass an explicit tag input and remain unaffected. A
weights tag like `embed-v1` now skips the pipeline entirely.
---
 .github/workflows/release.yml | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
index 59e2d132..a681da3b 100644
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@@ -67,6 +67,18 @@ jobs:
   resolve:
     name: Resolve release tag + SHA
     runs-on: ubuntu-latest
+    # Only run the package-release pipeline for version-shaped tags. The
+    # `release: published` event also fires for non-package releases that
+    # merely HOST assets — e.g. the `embed-v1` embedder-weights release
+    # (see packages/embedder/src/model-pins.ts) — which must NOT build,
+    # sign, or npm-publish anything. release-please tags are `root-v*`,
+    # `cli-v*`, or a bare `v*`; `workflow_call`/`workflow_dispatch` pass an
+    # explicit tag input and are always allowed.
+    if: >-
+      github.event_name != 'release'
+      || startsWith(github.event.release.tag_name, 'root-v')
+      || startsWith(github.event.release.tag_name, 'cli-v')
+      || startsWith(github.event.release.tag_name, 'v')
     outputs:
       tag: ${{ steps.t.outputs.tag }}
       sha: ${{ steps.t.outputs.sha }}