Skip to content

feat: add client-side profiling trigger#326

Open
maxyanghu wants to merge 5 commits into
mlcommons:mainfrom
CentML:feat/maxhu-add-profiler-trigger
Open

feat: add client-side profiling trigger#326
maxyanghu wants to merge 5 commits into
mlcommons:mainfrom
CentML:feat/maxhu-add-profiler-trigger

Conversation

@maxyanghu

@maxyanghu maxyanghu commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Add a client-side trigger that fires POST /start_profile at the beginning of the performance phase and POST /stop_profile at run end. Lets inference-endpoint capture a vLLM profile during a measured run without coupling endpoints to any vendor-specific server harness.

Motivated by MLPerf Inference v6.1 q3vl, which decouples LoadGen+vLLM into a separate HTTP server (mlperf-inf-mm-q3vl benchmark nv mpi-dynamo-vllm-serve). The server already supports HTTP-driven profiling — this PR is purely the client-side arm/disarm trigger.

What changed

  • config/schema.py — new ProfilerEngine str-enum (currently just vllm) and ProfilingConfig Pydantic model with two fields: engine: ProfilerEngine | None = None and urls: list[str] | None = None (CLI --profile-urls). Hung off Settings next to WarmupConfig. urls carries a scheme validator mirroring EndpointConfig's (rejects URLs without http:///https://, skips when None). Default-off; YAML and existing CLI behavior unchanged.
  • commands/benchmark/execute.py — at performance-phase start, POST /start_profile to each URL derived from the profiling endpoint list (strip trailing /v1, append the engine's path). The list is profiling.urls when set, otherwise endpoint_config.endpoints. In the run's finally block, POST /stop_profile to URLs whose start succeeded. Writes a Profiling section to report.txt and a sibling profiling.json. Warn-don't-fail throughout — profiling never fails a benchmark run.
  • commands/benchmark/cli.pyfrom-config gets --profile <engine> and --profile-urls <url...> override flags, same pattern as the existing --timeout / --mode overrides. offline / online already pick up both from the schema via cyclopts.

How to use

# Server, with profiling enabled at server start:
vllm serve \
    ... \
    --profiler-config.profiler=cuda \
    --profiler-config.delay_iterations=50 \
    ---profiler-config.max_iterations=100 \
    ...

# Client:
inference-endpoint benchmark from-config -c run.yaml
#  in YAML:
#   settings:
#     profiling:
#       engine: vllm

# When the profiler admin endpoint differs from the inference endpoint,
# override the trigger target (otherwise it derives from endpoint_config.endpoints):
# or in YAML:
#   settings:
#     profiling:
#       engine: vllm
#       urls:
#         - http://host:8001/v1

The actual trace window is bounded server-side by delay_iterations / max_iterations. The client's /stop_profile is a safety belt for the max_iterations=0 case and for aborted runs.

Caveats

  • Multi-replica fan-out is sequential; small skew (sub-iteration) between first and last server's cudaProfilerStop. Bounded well under max_iterations.
  • The client /stop_profile fires from the session-end finally, not from a per-phase end hook. Profiled runs are not expected to include the accuracy phase; if that changes, add an on_phase_end callback to BenchmarkSession.run.
  • URL derivation strips /v1 and assumes a bare host/port underneath. Reverse-proxy-prefixed deployments (e.g. gateway/api/v1) are not supported — but --profile-urls can supply the correct base directly when the profiler endpoint differs from the inference endpoint.

closes #324

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

@maxyanghu maxyanghu requested a review from a team May 26, 2026 19:47
@github-actions

github-actions Bot commented May 26, 2026

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces client-side profiling triggers for inference engines (specifically supporting vLLM) around the performance phase of a benchmark run. It adds a --profile CLI option, config schema updates, and logic to trigger start/stop profiling endpoints. The review feedback highlights a critical issue where the profiling stop requests are synchronous blocking network calls executed within the main event loop during the cleanup phase. This can block the event loop or prevent subsequent cleanup steps if interrupted, and should be run asynchronously using an executor.

Comment thread src/inference_endpoint/commands/benchmark/execute.py
@wangshangsam wangshangsam requested a review from arekay-nv May 27, 2026 05:13
@wangshangsam wangshangsam self-requested a review June 4, 2026 20:34
@maxyanghu maxyanghu force-pushed the feat/maxhu-add-profiler-trigger branch from a5d0534 to a4fe30b Compare June 10, 2026 18:53
Comment thread src/inference_endpoint/config/schema.py Outdated
@maxyanghu maxyanghu force-pushed the feat/maxhu-add-profiler-trigger branch 2 times, most recently from 8413f09 to 757b5b5 Compare June 15, 2026 19:00
@maxyanghu maxyanghu requested a review from wangshangsam June 15, 2026 19:16

@wangshangsam wangshangsam left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but (probably in a separate PR) I have a feeling that _run_benchmark_async() needs refactoring. It's currently 342 lines long; not very readable.

@maxyanghu maxyanghu force-pushed the feat/maxhu-add-profiler-trigger branch from c3e1b7b to cd84c53 Compare June 15, 2026 21:17
config: Annotated[Path, cyclopts.Parameter(name=["--config", "-c"])],
timeout: float | None = None,
mode: TestMode | None = None,
profile: Annotated[

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets keep from-config cli minimal if possible,
is there special need --profile, --profile_urls in from-config mode? (instead of specifying in config yaml)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I dropped --profile/profile_urls

@viraatc viraatc left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@viraatc viraatc left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Council — Multi-AI Code Review

Reviewed by: Codex (gpt-5.5) + Claude | Depth: standard

3 inline comments below. See the summary comment for the full picture, including findings already covered by existing reviewers.

# Fire /stop_profile for URLs whose /start_profile succeeded.
# Unifies the clean phase-end path and the abort path —
# both reach this block, both fire stops.
if profile_starts:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Codex] medium (correctness): The profiling window is not bounded to the performance phase when accuracy datasets are configured.

These /stop_profile calls run in the finally, which only executes after await session.run(phases, ...) (line 901) has finished every phase. _build_phases() appends phases as warmup → performance → accuracy (one PhaseConfig per accuracy dataset, execute.py:515-523), all driven by the single session.run. So /start_profile fires at performance-phase start but /stop_profile fires only after the accuracy phases complete — meaning that in submission / --mode both runs with separate accuracy datasets, the profiler trace and the reported Trigger span cover performance plus all accuracy traffic, contradicting the documented “around the performance phase” contract (ProfilingConfig docstring). Pure perf runs are unaffected.

Minimal fix: stop at performance-phase end rather than session end — e.g. add an on_phase_end hook to session.run (it currently exposes only on_phase_start, session.py:369-389) and fire the stops there when phase.phase_type == PhaseType.PERFORMANCE.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we won't run profiling with --mode both/acc, it's only for --mode perf. Plus there is no existing on_phase_end hook. We don't want to add it merely because we want a profiling functionality.

if profile_urls is not None:
profiling_update["urls"] = profile_urls
if profiling_update:
new_profiling = resolved.settings.profiling.model_copy(update=profiling_update)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Codex + Claude] medium (data-integrity): this from-config override bypasses the new URL-scheme validation.

In Pydantic v2 model_copy(update=...) does not run validators, so ProfilingConfig._validate_url_scheme (schema.py:666) never fires on this path. A scheme-less value such as benchmark from-config --profile-urls localhost:8000 is silently accepted; _derive_profile_urls then yields localhost:8000/start_profile and urlopen raises unknown url type, which is swallowed into the record so the trigger silently no-ops. The offline/online path constructs ProfilingConfig normally and does reject the same input — so the two CLI paths disagree (verified empirically on pydantic 2.12.0).

Minimal fix: re-validate instead of copying, e.g.

new_profiling = ProfilingConfig.model_validate(
    {**resolved.settings.profiling.model_dump(), **profiling_update}
)

Related to @viraatc’s note above on keeping the from-config surface minimal — if these overrides are dropped, this code path (and the asymmetry) disappears entirely.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just testing this PR out as well, +1, I saw this issue too, malformed URLs are able to pass through using from-config

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've dropped --profile/profile_urls from from-config cli. This issue doesn't exist any more.

}


def _derive_profile_urls(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude] medium (testing): none of the new profiling code is exercised by tests (the only profil* test, tests/unit/test_profiler.py, covers the unrelated line-profiler module). The branches most worth covering — and the ones implicated by the two correctness issues in this review — are: _derive_profile_urls /v1 stripping and the empty-endpoints ValueError (here); the model_validate-vs-model_copy validation asymmetry (cli.py:132); and the start/stop record plumbing into profiling.json. A monkeypatched urlopen (or a stub server) keeps these as fast unit tests; per AGENTS.md the new branches should be covered before merge.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests added

@viraatc

viraatc commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Review Council — Multi-AI Code Review

Reviewed by: Codex (gpt-5.5, xhigh) + Claude | Depth: standard | Focus: code & integration quality, minimal changes

Posted 3 inline comments. Each was verified against the source at HEAD (cd84c53) and de-duplicated against the 4 existing human/bot comments.

Posted inline

# File Line Severity Category Reviewer(s) Summary
1 commands/benchmark/execute.py 913 medium correctness Codex /stop_profile fires in the finally after all phases, so with accuracy datasets the trace + Trigger span cover performance plus accuracy traffic — not "around the performance phase". Bound the stop to perf-phase end (needs an on_phase_end hook).
2 commands/benchmark/cli.py 132 medium data-integrity Codex + Claude model_copy(update=…) skips validators, so --profile-urls on from-config bypasses _validate_url_scheme (scheme-less URL silently no-ops the trigger). offline/online correctly rejects it. Re-validate via ProfilingConfig.model_validate({**dump, **update}).
3 commands/benchmark/execute.py 556 medium testing Claude No tests exercise any new profiling code (URL derivation, validator asymmetry, start/stop plumbing). The untested branches are exactly the ones implicated by #1/#2.

Already covered / not re-posted (to avoid duplicate noise)

  • Blocking urllib.urlopen on the event loop — already filed by gemini-code-assist[bot] at execute.py:928 for the stop loop (recommends loop.run_in_executor + except BaseException so a 2nd Ctrl+C can't skip cleanup). Worth extending: the start loop (execute.py:887-897) has the identical synchronous-urlopen-on-the-loop pattern and isn't mentioned in that thread — the same run_in_executor fix should apply there too.
  • Should --profile/--profile-urls exist in from-config at all? — already raised by @viraatc at cli.py:102. Resolving that (dropping the overrides) would also eliminate finding [fix] Update review and branch name #2's code path.

Low / non-blocking (noted, not posted inline)

  • profiling.json + report section are skipped on the execution-failure path. If session.run raises, the finally still POSTs /stop_profile (good — the server profiler is stopped), but _run_benchmark_async re-raises as ExecutionError; run_benchmark (execute.py:1176) catches only KeyboardInterrupt, so finalize_benchmark never runs and the stop records (tagged stop_reason="abort") are discarded. Consequently the "(from abort handler)" suffix in _write_profiling_section is effectively unreachable — the only path that returns a BenchmarkResult is the clean/SIGINT path, where session_completed_normally=True"phase_end". Low impact; the comment "both reach this block, both fire stops" is accurate for the HTTP call but not for the persisted record.
  • DRY: the --profile / --profile-urls flag names + help strings are duplicated between ProfilingConfig (schema.py, via alias=) and the explicit from_config params (cli.py:102-117). Partly unavoidable (from_config takes a Path, not the model), but the help text is copy-pasted and will drift — extract to shared constants if the overrides are kept.

Commit hygiene: clean. The main..HEAD range shows 6 commits, but 3 are a merge + already-merged PRs (#355, #331) pulled in via the merge; the PR contributes 2 feat: + 1 fix:.


🤖 Two independent AI reviews (Codex + Claude), synthesized and verified against HEAD. Comments are advisory (event: COMMENT); no approval state changed.

@maxyanghu maxyanghu requested review from leopck and viraatc June 16, 2026 19:10
@maxyanghu maxyanghu force-pushed the feat/maxhu-add-profiler-trigger branch from c2d22ec to 1a951ac Compare June 16, 2026 19:11
@maxyanghu

Copy link
Copy Markdown
Collaborator Author

@viraatc do you know why Tests/audit is failing?

@viraatc

viraatc commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

its an audit failure, we need to update aiohttp on main first (ill make separate MR)

@viraatc

viraatc commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

merged now, can u pl. rebase?
#358

Adds an optional client-side trigger that fires POST /start_profile
at the performance phase start and /stop_profile at run end, so a
profiled run can be driven from a YAML/CLI flag without coupling
endpoints to any vendor harness.

Schema: ProfilerEngine enum (currently {vllm}) and ProfilingConfig
hung off Settings. URLs are auto-derived per entry in
endpoint_config.endpoints (strip /v1, append engine-specific path).
Default-off; warn-don't-fail throughout.

Report.txt gets a Profiling section and a sibling profiling.json is
written next to result_summary.json when the trigger is enabled.
The profiler-trigger commit (a4fe30b) left the Field( call for
metrics_tokenizer_workers unterminated, so config/schema.py raised
SyntaxError and the inference-endpoint CLI could not import. Add the
missing ) so the module compiles.
Add an optional profiling.endpoints (CLI --profile-endpoints) field so the
profiler start/stop triggers can target a different host than the inference
endpoint. When unset, URLs are still derived from endpoint_config.endpoints;
when set, derivation runs over the override list using the same engine-specific
protocol. Adds a scheme validator mirroring EndpointConfig and a matching
--profile-endpoints override on the from-config subcommand.
Keep the from-config CLI surface minimal per review feedback: profiling
is configured via the YAML settings.profiling block for from-config runs.
This also removes the model_copy(update=...) path that bypassed
ProfilingConfig URL-scheme validation. offline/online keep the
schema-generated --profile/--profile-urls flags, which validate normally.
Adds unit tests for the client-side profiling trigger (review finding #3):
- TestProfilingConfig (test_schema.py): defaults, engine enum coercion, and
  URL-scheme validation on both the direct-construction (offline/online) and
  model_validate (from-config YAML) paths.
- TestProfilingHelpers (test_benchmark.py): _derive_profile_urls /v1 stripping
  and empty-endpoints ValueError, _post_profile 200/404/connection-failure via
  mocked urlopen, _render_profile_status, and _write_profiling_section output
  plus profiling.json serializability.
@maxyanghu maxyanghu force-pushed the feat/maxhu-add-profiler-trigger branch from 1a951ac to 6cba0e0 Compare June 22, 2026 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: support /profile_start

4 participants