Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/agents/Tester.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ Follow the Testing Trophy (Kent C. Dodds), not the Test Pyramid:
- **Static analysis** (linters, sanitizers, warnings-as-errors) is always on
- **Tests that check output must assert expected values**, not just check non-empty — validate correctness

### Real integration tests for public-API surface (non-negotiable)

Any new behavior reachable through the public SDK API (`Model::Load`/`Unload`, `Session`, `ChatSession`, catalog ops, web service endpoints) **must** get a "real" integration test under `test/sdk_api/` (the `sdk_integration_tests` binary) that drives the *production entry point* end-to-end against the real backend (real cached model and/or a real in-process `WebService`). An `internal_api` unit test that calls an internal class directly, or stubs the other side of a contract that also lives in this repo, is **not** a substitute. A one-sided unit test passes while the production path is broken — PR #839 proved it (router unit test called `router->Load(alias)` and passed, while `Model::Load` sending a model_id 404'd against the real service). When a feature spans a client/server or local/external boundary, write the round-trip test through the real server, with the same identifiers production uses.

### Test Patterns for This Project

- Disabled live tests use `DISABLED_` prefix, run via `--gtest_also_run_disabled_tests`
Expand Down
2 changes: 2 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ This is a multi-language SDK monorepo. The primary SDKs live under:
- Test binaries are in `sdk_v2/cpp/build/Windows/RelWithDebInfo/bin/RelWithDebInfo/`
- `sdk_integration_tests.exe` — integration tests (loads models via SharedTestEnv, slow startup)
- `foundry_local_tests.exe` — unit tests (no model loading, fast)
- `cache_only_tests.exe` — cache-only / external-service-URL public-API tests (`test/sdk_api/`, separate binary because `Manager` is a singleton)
- `external_mode_tests.exe` — client/server split tests; re-execs itself with `--serve` to host the web service (`test/sdk_api/`, separate binary for the same singleton reason)

## C++ Code Style

Expand Down
21 changes: 17 additions & 4 deletions .github/instructions/cpp-testing.instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,26 @@ If a model role prints `(none)`, the test fixtures for that role will skip. Chec

## Test Binary Quick Reference

| Binary | Type | Startup | Count |
|--------|------|---------|-------|
| `foundry_local_tests.exe` | Unit | Fast (no models) | ~700 |
| `sdk_integration_tests.exe` | Integration | Slow (model loading) | ~100 |
| Binary | Type | Source dir | Startup | Count |
|--------|------|-----------|---------|-------|
| `foundry_local_tests.exe` | Unit / internal API | `test/internal_api/` (+ pure-logic `test/sdk_api/`) | Fast (no models) | ~700 |
| `sdk_integration_tests.exe` | Integration / public API | `test/sdk_api/` | Slow (model loading, real web service) | ~100 |
| `cache_only_tests.exe` | Integration / public API (cache-only + external service URL) | `test/sdk_api/` | Moderate (own `Manager` configs) | ~10 |
| `external_mode_tests.exe` | Integration / public API (client/server split) | `test/sdk_api/` | Slow (spawns a `--serve` host child, loads a model) | 2 |

`cache_only_tests` and `external_mode_tests` live in `test/sdk_api/` but build into their own binaries because `Manager` is a singleton — each needs `Manager` instances configured differently than the shared `SharedTestEnv` host, so they cannot coexist in `sdk_integration_tests`' process. Source location tracks the API surface (public, black-box); the binary split is an orthogonal CMake concern.

Always use `--gtest_filter` when iterating on a specific test or small group. The full integration suite takes ~10 minutes.

## Real integration tests are mandatory for public-API surface

Any new behavior reachable through the public SDK API — `Model::Load`/`Unload`/`IsLoaded`, `Session`/`ChatSession`, catalog operations, the web service endpoints, etc. — **must** get a "real" integration test under `test/sdk_api/` (the `sdk_integration_tests` binary) that drives the *production entry point* end-to-end against the real backend (real cached model and/or a real in-process `WebService`). An `internal_api` unit test that exercises an internal class directly, or stubs the other side of a contract, is **not** a substitute and does not satisfy this requirement.

Rules:
- When a feature spans a client/server or local/external boundary that both live in this repo, write a **round-trip test that goes through the real server**, not a stub. Stubbed boundary tests are fine *in addition*, never *instead*.
- Drive the same entry point the user calls (`Model::Load()`, not the internal router/manager) so the test exercises identifier translation, serialization, and endpoint resolution exactly as production does.
- Use realistic identifiers. If production passes a model_id, the test must pass a model_id — not an alias that happens to resolve.

## Inner-loop workflow for a single test

When verifying a fix or a new test, never re-run the whole suite via `ctest`. The fast path is:
Expand Down
1 change: 1 addition & 0 deletions sdk_v2/cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ set(FOUNDRY_LOCAL_SOURCES
src/inferencing/model_load_manager.cc
src/inferencing/generative/chat/onnx_chat_generator.cc
src/model.cc
src/model_command_router.cc
src/inferencing/generative/openresponses/response_converter.cc
src/inferencing/generative/openresponses/response_store.cc
src/contracts/responses_json.cc
Expand Down
211 changes: 211 additions & 0 deletions sdk_v2/cpp/docs/ModelCommandRoutingPlan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
# Model Command Routing (External Service Mode) — C++ Implementation Plan

## Overview

When the SDK is configured with `Configuration::external_service_url`, the three
**model management commands** must be redirected to that remote Foundry Local
service over HTTP instead of being executed against the in-process model loader:

| Command | External endpoint |
|-------------|--------------------------------------------|
| List loaded | `GET {url}/models/loaded` |
| Load | `GET {url}/models/load/{url-encoded id}` |
| Unload | `GET {url}/models/unload/{url-encoded id}` |

The OpenAI-compatible endpoints (`/v1/models`, `/v1/chat/completions`, …) are
**out of scope** — callers reach those directly, not through the SDK.

In the v1 SDKs this redirection was reimplemented in every language binding
(`ModelLoadManager.cs`, `model_load_manager.py`, `modelLoadManager.ts`,
`model_load_manager.rs`). In v2 it is handled **once** in the C++ core so every
binding inherits it for free.

**v1 reference implementation:**
- `sdk/cs/src/Detail/ModelLoadManager.cs`
- `sdk/python/src/detail/model_load_manager.py`
- `sdk/js/src/detail/modelLoadManager.ts`
- `sdk/rust/src/detail/model_load_manager.rs`

---

## Why a new component (not augmenting `ModelLoadManager`)

The v2 `ModelLoadManager`
([`inferencing/model_load_manager.h`](../src/inferencing/model_load_manager.h)) is
**not** the v1 `ModelLoadManager`. Despite the shared name they have completely
different responsibilities:

- **v1 `ModelLoadManager`** — a thin *router*: three methods, each branching
`if (external_url) HTTP else core`.
- **v2 `ModelLoadManager`** — the *local ORT GenAI lifecycle owner*: holds the
`GenAIModelInstance` map, an `IEpDetector&`, per-model session refcounts,
`RejectNewLoads()`, and `UnloadAll()` drain-on-shutdown.

Bolting `external_service_url` + HTTP into the v2 class would conflate two
unrelated reasons-to-change ("own in-process ORT GenAI instances" vs. "talk to a
remote server") and its signatures fight it: `LoadModel` returns
`LoadResult{ GenAIModelInstance* }` and `GetLoadedModel` returns a live local
instance — neither exists in external mode. `UnloadAll`, `RejectNewLoads`, EP
detection, and session drain are all meaningless remotely.

**Decision:** introduce a higher-level façade, `ModelCommandRouter`, that owns the
local-vs-external decision for only the three management commands. The v2
`ModelLoadManager` stays a pure local lifecycle owner.

---

## Architecture

```
flModel.Load/Unload flCatalog.GetLoadedModels
│ │
▼ ▼
Model::Load/Unload/IsLoaded BaseModelCatalog::GetLoadedModels
│ │
└──────────────┬───────────────┘
ModelCommandRouter ← owns local-vs-external decision
├── local ──► ModelLoadManager (LoadModel / UnloadModel / GetLoadedModelIds)
└── external ─► http_client → {external_service_url}/models/...
```

### Router interface

```cpp
class ModelCommandRouter {
public:
ModelCommandRouter(std::optional<std::string> external_service_url,
ModelLoadManager& load_manager,
std::string app_name,
ILogger& logger);

// Local: ModelLoadManager::LoadModel(local_path, model_id, ep)
// External: GET /models/load/{id}
void Load(std::string_view model_id, std::string_view local_path, ExecutionProvider ep);

// Local: ModelLoadManager::UnloadModel(model_id)
// External: GET /models/unload/{id}
void Unload(std::string_view model_id);

// Local: ModelLoadManager::GetLoadedModel(model_id) != nullptr
// External: membership test against ListLoadedModelIds()
bool IsLoaded(std::string_view model_id);

// Local: ModelLoadManager loaded-map keys
// External: GET /models/loaded (single round-trip, JSON array of ids)
std::vector<std::string> ListLoadedModelIds();
};
```

### Key design principles

1. **`Model` and the catalog become mode-agnostic.** They call the router; they no
longer know external mode exists. `Model` swaps its `ModelLoadManager*` member for
a `ModelCommandRouter*`, wired in the same `Model::Create` / `Manager::CreateModel`
factory that wires it today.
2. **`ModelLoadManager` gains no mode logic.** The router *holds a
`ModelLoadManager&`* and delegates the local branch to it. The only addition to
`ModelLoadManager` is one in-character method, `GetLoadedModelIds()`, that reads
its own loaded map.
3. **List is a single batch call — no N-calls.**
`BaseModelCatalog::GetLoadedModels()` is rewritten to call
`router.ListLoadedModelIds()` **once** and filter `models_` by set membership,
instead of N per-model `IsLoaded()` calls. This also fixes a latent bug: in
external mode the current per-model `IsLoaded()` path always returns empty.
4. **Local inference is untouched.** External mode means "run inference via the
external HTTP endpoints." The inference path fetches `GenAIModelInstance*` directly
in the session layer, never through `Model`/the router, so it needs no change.
5. **Reuse existing infrastructure.**
[`http/http_client.h`](../src/http/http_client.h) (`HttpGetWithResponse`,
`DescribeFailure`) for HTTP + errors, the shared `UrlEncode` helper in
[`utils.h`](../src/utils.h) for the `{id}` path segment, and `Configuration::app_name` for the
`User-Agent`.

---

## Wiring (`Manager`)

`Manager` owns a `std::unique_ptr<ModelCommandRouter>`, declared **after**
`model_load_manager_` so it is destroyed first. Construction order:

```
model_load_manager_ → model_command_router_ → catalog_
```

The router is constructed with `config_.external_service_url`,
`*model_load_manager_`, `config_.app_name`, and `*logger_`. It is injected into the
catalog ctor and forwarded into each `Model` via the existing `CreateModel`
factory (replacing the `ModelLoadManager&` argument).

---

## Error handling

External calls use `http::HttpGetWithResponse` and throw on non-2xx / transport
failure via `FL_THROW(FOUNDRY_LOCAL_ERROR_NETWORK, http::DescribeFailure(resp))`,
matching v1 semantics (which threw `FoundryLocalException` on `!IsSuccessStatusCode`).
`ListLoadedModelIds` parses the response body as a JSON array of strings; an empty
body yields an empty list.

---

## Open questions — decisions

1. **Load of an un-cached model in external mode.**
*Decision: accept the cache-only constraint.* The v2 API is model-object-centric
and the catalog is cache-only in external mode, so callers can only `Load()` a model
already present in their local cache. We ship the model-object path now and **defer**
any `Manager::LoadModelById(id)` escape hatch until there is a concrete need. (v1
took raw id strings and could load anything the remote had; that capability is
intentionally not reproduced yet.)

2. **Session-creation guard in external mode.**
*Decision: add an explicit guard.* [`configuration.h`](../src/configuration.h)
documents that session creation is blocked in external mode, but `Session_CreateImpl`
currently has no explicit check — it only fails indirectly (no local instance). Add
an explicit `FL_THROW(FOUNDRY_LOCAL_ERROR_INVALID_USAGE, …)` guard on the
session-creation path when `external_service_url` is set, mirroring the existing
`StartWebService` guard, so the failure is clear and intentional.

3. **Endpoint timeout.**
*Decision: drop the v1 `timeout` query param; set a deliberate client-side timeout
in the core instead.* Investigation of the v1 bindings showed the `timeout` query
param was **never implemented** — it exists only as a commented-out TODO in C#
(`// { "timeout", ... }`) and Python (`# "timeout": "30"`), and is absent in JS/Rust.
So there is no real capability to preserve. The client-side HTTP timeout was also
inconsistent and accidental rather than designed: Python hard-coded **10s** on all
three calls (effectively a latent bug — a real model load routinely exceeds 10s),
C# used the `HttpClient` **default 100s**, and JS/Rust were unbounded.

Moving this into the C++ core gives every binding a *single, deliberate* value for
the first time. Use a generous `HttpRequestOptions.timeout` for the load call
(target ~5 minutes — model load can be slow) and the default for list/unload. We do
**not** reproduce Python's 10s. A future server-side `timeout` query param can be
added later if the remote service ever supports one.

---

## Work breakdown

**Implementation (`@CppCoder`)**
- New `src/model_command_router.{h,cc}`.
- Add `ModelLoadManager::GetLoadedModelIds()` (reads its own loaded map).
- Swap `Model`'s member `ModelLoadManager* → ModelCommandRouter*`; update
`Model::Create` / `Manager::CreateModel` wiring.
- Rewrite `BaseModelCatalog::GetLoadedModels()` to the single-batch path; thread the
router through the catalog ctor.
- Construct/own the router in `Manager` with correct declaration order.
- Add the external-mode session-creation guard (open question #2).

**Tests (`@Tester`)**
- Router unit tests covering both branches (mock/loopback the HTTP side).
- External-mode integration test mirroring v1 `TestModelLoadManagerExternalService`:
start a real local web service, point a second `Manager` at it via
`external_service_url`, verify load / unload / list round-trip.
- Regression test that `GetLoadedModels()` is correct in external mode.
- Test that session creation throws in external mode.

**Review (`@Reviewer`)**
- Confirm `ModelLoadManager` gained no mode logic.
- Confirm the router holds no inference/EP concerns.
- Confirm error/ownership semantics match SDK conventions.
37 changes: 28 additions & 9 deletions sdk_v2/cpp/run_coverage.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,16 @@ if (-not (Test-Path $opencpp)) {
$buildDir = Join-Path $PSScriptRoot "build\Windows\$Config"
$binDir = Join-Path $buildDir "bin\$Config"
$unitExe = Join-Path $binDir "foundry_local_tests.exe"
$integExe = Join-Path $binDir "sdk_integration_tests.exe"

# Integration coverage comes from every binary built from test/sdk_api/ sources. They all link the
# public foundry_local_cpp surface and drive real models / a real web service. Manager being a
# singleton forces the public-API suite to be split across several binaries (default, cache-only,
# external-mode client/server) rather than one — so iterate over all of them.
$integExes = @(
(Join-Path $binDir "sdk_integration_tests.exe")
(Join-Path $binDir "cache_only_tests.exe")
(Join-Path $binDir "external_mode_tests.exe")
)
$srcDir = Join-Path $PSScriptRoot "src"
$covDir = Join-Path $buildDir "coverage"

Expand Down Expand Up @@ -84,13 +93,23 @@ try {
$mergeArgs = @("--input_coverage", $unitCov)

if (-not $SkipIntegration) {
if (-not (Test-Path $integExe)) {
Write-Warning "Integration test binary not found: $integExe - skipping integration coverage."
} else {
Write-Host "`n=== Collecting integration test coverage ===" -ForegroundColor Cyan
foreach ($integExe in $integExes) {
$name = [System.IO.Path]::GetFileNameWithoutExtension($integExe)

if (-not (Test-Path $integExe)) {
Write-Warning "Integration test binary not found: $integExe - skipping."
continue
}

Write-Host "`n=== Collecting integration test coverage: $name ===" -ForegroundColor Cyan

$integCov = Join-Path $covDir "$name.cov"

$integCov = Join-Path $covDir "integration.cov"
$integArgs = $commonArgs + @("--export_type", "binary:$integCov", "--")
# --cover_children: external_mode_tests re-execs itself with `--serve` to host the web
# service in a child process — the src/ code under test (web_service, manager shutdown)
# runs there, not in the parent. Without this its coverage would be lost. Harmless for
# the binaries that don't spawn children.
$integArgs = $commonArgs + @("--cover_children", "--export_type", "binary:$integCov", "--")
$integArgs += $integExe

# Include DISABLED_ tests (e.g. download tests) for maximum coverage.
Expand All @@ -103,9 +122,9 @@ try {
& $opencpp @integArgs
if (Test-Path $integCov) {
$mergeArgs += @("--input_coverage", $integCov)
Write-Host "Integration coverage saved to $integCov" -ForegroundColor Green
Write-Host "$name coverage saved to $integCov" -ForegroundColor Green
} else {
Write-Warning "Integration coverage file was not created - continuing with unit only."
Write-Warning "$name coverage file was not created - continuing without it."
}
}
}
Expand Down
7 changes: 7 additions & 0 deletions sdk_v2/cpp/src/c_api.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1662,6 +1662,13 @@ FL_API_STATUS_IMPL(Session_CreateImpl, const flModel* model, flSession** out_ses
return MakeStatus(FOUNDRY_LOCAL_ERROR_INVALID_ARGUMENT, "null argument");
}

// External mode runs inference via the remote HTTP endpoints, not through a local session.
// Fail fast with a clear message instead of failing indirectly when no local instance exists.
if (fl::Manager::Instance().GetConfiguration().external_service_url.has_value()) {
return MakeStatus(FOUNDRY_LOCAL_ERROR_INVALID_USAGE,
"cannot create a local inference session when external_service_url is configured");
}

auto session = fl::Session::Create(*AsImpl(model));
*out_session = AsHandle<flSession>(session.release());
return nullptr;
Expand Down
3 changes: 2 additions & 1 deletion sdk_v2/cpp/src/catalog/azure_model_catalog.cc
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,13 @@ namespace fl {
AzureModelCatalog::AzureModelCatalog(std::vector<std::pair<std::string, std::optional<std::string>>> catalog_urls,
std::string cache_dir,
ModelFactory model_factory,
ModelCommandRouter& router,
const IEpDetector& ep_detector,
ILogger& logger,
bool cache_only,
std::string catalog_region,
bool disable_region_fallback)
: BaseModelCatalog(catalog_urls.empty() ? kDefaultCatalogUrl : catalog_urls.front().first, logger),
: BaseModelCatalog(catalog_urls.empty() ? kDefaultCatalogUrl : catalog_urls.front().first, router, logger),
catalog_urls_(std::move(catalog_urls)),
cache_dir_(std::move(cache_dir)),
model_factory_(std::move(model_factory)),
Expand Down
Loading