feat: add client-side profiling trigger#326
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Code Review
This pull request introduces client-side profiling triggers for inference engines (specifically supporting vLLM) around the performance phase of a benchmark run. It adds a --profile CLI option, config schema updates, and logic to trigger start/stop profiling endpoints. The review feedback highlights a critical issue where the profiling stop requests are synchronous blocking network calls executed within the main event loop during the cleanup phase. This can block the event loop or prevent subsequent cleanup steps if interrupted, and should be run asynchronously using an executor.
a5d0534 to
a4fe30b
Compare
8413f09 to
757b5b5
Compare
c3e1b7b to
cd84c53
Compare
| config: Annotated[Path, cyclopts.Parameter(name=["--config", "-c"])], | ||
| timeout: float | None = None, | ||
| mode: TestMode | None = None, | ||
| profile: Annotated[ |
There was a problem hiding this comment.
lets keep from-config cli minimal if possible,
is there special need --profile, --profile_urls in from-config mode? (instead of specifying in config yaml)
There was a problem hiding this comment.
Thanks! I dropped --profile/profile_urls
viraatc
left a comment
There was a problem hiding this comment.
Review Council — Multi-AI Code Review
Reviewed by: Codex (gpt-5.5) + Claude | Depth: standard
3 inline comments below. See the summary comment for the full picture, including findings already covered by existing reviewers.
| # Fire /stop_profile for URLs whose /start_profile succeeded. | ||
| # Unifies the clean phase-end path and the abort path — | ||
| # both reach this block, both fire stops. | ||
| if profile_starts: |
There was a problem hiding this comment.
[Codex] medium (correctness): The profiling window is not bounded to the performance phase when accuracy datasets are configured.
These /stop_profile calls run in the finally, which only executes after await session.run(phases, ...) (line 901) has finished every phase. _build_phases() appends phases as warmup → performance → accuracy (one PhaseConfig per accuracy dataset, execute.py:515-523), all driven by the single session.run. So /start_profile fires at performance-phase start but /stop_profile fires only after the accuracy phases complete — meaning that in submission / --mode both runs with separate accuracy datasets, the profiler trace and the reported Trigger span cover performance plus all accuracy traffic, contradicting the documented “around the performance phase” contract (ProfilingConfig docstring). Pure perf runs are unaffected.
Minimal fix: stop at performance-phase end rather than session end — e.g. add an on_phase_end hook to session.run (it currently exposes only on_phase_start, session.py:369-389) and fire the stops there when phase.phase_type == PhaseType.PERFORMANCE.
There was a problem hiding this comment.
we won't run profiling with --mode both/acc, it's only for --mode perf. Plus there is no existing on_phase_end hook. We don't want to add it merely because we want a profiling functionality.
| if profile_urls is not None: | ||
| profiling_update["urls"] = profile_urls | ||
| if profiling_update: | ||
| new_profiling = resolved.settings.profiling.model_copy(update=profiling_update) |
There was a problem hiding this comment.
[Codex + Claude] medium (data-integrity): this from-config override bypasses the new URL-scheme validation.
In Pydantic v2 model_copy(update=...) does not run validators, so ProfilingConfig._validate_url_scheme (schema.py:666) never fires on this path. A scheme-less value such as benchmark from-config --profile-urls localhost:8000 is silently accepted; _derive_profile_urls then yields localhost:8000/start_profile and urlopen raises unknown url type, which is swallowed into the record so the trigger silently no-ops. The offline/online path constructs ProfilingConfig normally and does reject the same input — so the two CLI paths disagree (verified empirically on pydantic 2.12.0).
Minimal fix: re-validate instead of copying, e.g.
new_profiling = ProfilingConfig.model_validate(
{**resolved.settings.profiling.model_dump(), **profiling_update}
)Related to @viraatc’s note above on keeping the from-config surface minimal — if these overrides are dropped, this code path (and the asymmetry) disappears entirely.
There was a problem hiding this comment.
I was just testing this PR out as well, +1, I saw this issue too, malformed URLs are able to pass through using from-config
There was a problem hiding this comment.
I've dropped --profile/profile_urls from from-config cli. This issue doesn't exist any more.
| } | ||
|
|
||
|
|
||
| def _derive_profile_urls( |
There was a problem hiding this comment.
[Claude] medium (testing): none of the new profiling code is exercised by tests (the only profil* test, tests/unit/test_profiler.py, covers the unrelated line-profiler module). The branches most worth covering — and the ones implicated by the two correctness issues in this review — are: _derive_profile_urls /v1 stripping and the empty-endpoints ValueError (here); the model_validate-vs-model_copy validation asymmetry (cli.py:132); and the start/stop record plumbing into profiling.json. A monkeypatched urlopen (or a stub server) keeps these as fast unit tests; per AGENTS.md the new branches should be covered before merge.
Review Council — Multi-AI Code ReviewReviewed by: Codex (gpt-5.5, xhigh) + Claude | Depth: standard | Focus: code & integration quality, minimal changes Posted 3 inline comments. Each was verified against the source at HEAD ( Posted inline
Already covered / not re-posted (to avoid duplicate noise)
Low / non-blocking (noted, not posted inline)
🤖 Two independent AI reviews (Codex + Claude), synthesized and verified against HEAD. Comments are advisory ( |
c2d22ec to
1a951ac
Compare
|
@viraatc do you know why |
|
its an audit failure, we need to update aiohttp on main first (ill make separate MR) |
|
merged now, can u pl. rebase? |
Adds an optional client-side trigger that fires POST /start_profile
at the performance phase start and /stop_profile at run end, so a
profiled run can be driven from a YAML/CLI flag without coupling
endpoints to any vendor harness.
Schema: ProfilerEngine enum (currently {vllm}) and ProfilingConfig
hung off Settings. URLs are auto-derived per entry in
endpoint_config.endpoints (strip /v1, append engine-specific path).
Default-off; warn-don't-fail throughout.
Report.txt gets a Profiling section and a sibling profiling.json is
written next to result_summary.json when the trigger is enabled.
The profiler-trigger commit (a4fe30b) left the Field( call for metrics_tokenizer_workers unterminated, so config/schema.py raised SyntaxError and the inference-endpoint CLI could not import. Add the missing ) so the module compiles.
Add an optional profiling.endpoints (CLI --profile-endpoints) field so the profiler start/stop triggers can target a different host than the inference endpoint. When unset, URLs are still derived from endpoint_config.endpoints; when set, derivation runs over the override list using the same engine-specific protocol. Adds a scheme validator mirroring EndpointConfig and a matching --profile-endpoints override on the from-config subcommand.
Keep the from-config CLI surface minimal per review feedback: profiling is configured via the YAML settings.profiling block for from-config runs. This also removes the model_copy(update=...) path that bypassed ProfilingConfig URL-scheme validation. offline/online keep the schema-generated --profile/--profile-urls flags, which validate normally.
Adds unit tests for the client-side profiling trigger (review finding #3): - TestProfilingConfig (test_schema.py): defaults, engine enum coercion, and URL-scheme validation on both the direct-construction (offline/online) and model_validate (from-config YAML) paths. - TestProfilingHelpers (test_benchmark.py): _derive_profile_urls /v1 stripping and empty-endpoints ValueError, _post_profile 200/404/connection-failure via mocked urlopen, _render_profile_status, and _write_profiling_section output plus profiling.json serializability.
1a951ac to
6cba0e0
Compare
Summary
Add a client-side trigger that fires
POST /start_profileat the beginning of the performance phase andPOST /stop_profileat run end. Letsinference-endpointcapture a vLLM profile during a measured run without coupling endpoints to any vendor-specific server harness.Motivated by MLPerf Inference v6.1 q3vl, which decouples LoadGen+vLLM into a separate HTTP server (
mlperf-inf-mm-q3vl benchmark nv mpi-dynamo-vllm-serve). The server already supports HTTP-driven profiling — this PR is purely the client-side arm/disarm trigger.What changed
config/schema.py— newProfilerEnginestr-enum (currently justvllm) andProfilingConfigPydantic model with two fields:engine: ProfilerEngine | None = Noneandurls: list[str] | None = None(CLI--profile-urls). Hung offSettingsnext toWarmupConfig.urlscarries a scheme validator mirroringEndpointConfig's (rejects URLs withouthttp:///https://, skips whenNone). Default-off; YAML and existing CLI behavior unchanged.commands/benchmark/execute.py— at performance-phase start, POST/start_profileto each URL derived from the profiling endpoint list (strip trailing/v1, append the engine's path). The list isprofiling.urlswhen set, otherwiseendpoint_config.endpoints. In the run'sfinallyblock, POST/stop_profileto URLs whose start succeeded. Writes aProfilingsection toreport.txtand a siblingprofiling.json. Warn-don't-fail throughout — profiling never fails a benchmark run.commands/benchmark/cli.py—from-configgets--profile <engine>and--profile-urls <url...>override flags, same pattern as the existing--timeout/--modeoverrides.offline/onlinealready pick up both from the schema via cyclopts.How to use
The actual trace window is bounded server-side by
delay_iterations/max_iterations. The client's/stop_profileis a safety belt for themax_iterations=0case and for aborted runs.Caveats
cudaProfilerStop. Bounded well undermax_iterations./stop_profilefires from the session-endfinally, not from a per-phase end hook. Profiled runs are not expected to include the accuracy phase; if that changes, add anon_phase_endcallback toBenchmarkSession.run./v1and assumes a bare host/port underneath. Reverse-proxy-prefixed deployments (e.g.gateway/api/v1) are not supported — but--profile-urlscan supply the correct base directly when the profiler endpoint differs from the inference endpoint.closes #324
Type of change
Checklist