Skip to content

feat: add -vvv trace mode dashboard for client overhead#334

Open
viraatc wants to merge 1 commit into
mainfrom
feat/trace-events
Open

feat: add -vvv trace mode dashboard for client overhead#334
viraatc wants to merge 1 commit into
mainfrom
feat/trace-events

Conversation

@viraatc

@viraatc viraatc commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Summary

Opt-in (-vvv) binary trace pipeline: per-request lifecycle latency + per-process asyncio event-loop lag, rendered as a live rich.Live dashboard alongside the running benchmark. Off by default; no measurable overhead when off (see Perf impact).

Example (demo data)

═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
  uptime                              0.0s   status                      BACKPRESSURE   req/s                           12,747.2
  issued                            80,000   dropped frames                       512   tok/s                        5,353,810.0
═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════

  REQUEST LIFECYCLE  (ms)
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  stage                                                        N        avg        min        p50        p99        max     %E2E
  [client] issue -> ipc_2_worker -> conn acquired            400      33.87      25.00      34.01      43.02      43.00    20.5%
  [client] conn acquired -> payload written                  400       1.00       1.00       1.00       1.00       1.00     0.6%
  [server] payload written -> headers recvd                  400       2.00       2.00       2.00       2.00       2.00     1.2%
  [server] headers recvd -> 1st chunk                        400      31.55      28.00      31.60      35.23      35.20    19.1%
  [server] 1st chunk -> last chunk                           400      87.74      70.00      88.01     106.04     106.00    53.2%
  [client] last chunk -> ipc_2_main -> complete              400       8.89       8.00       8.90       9.81       9.80     5.4%
  E2E TOTAL  issue -> complete                               400     165.04     134.00     165.54     197.00     197.00   100.0%

  client work                        22.4%   server work                        73.5%   backpressure [workers busy]        20.5%
  e2e               │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▒███████████████████████████████████████████████████████████▒▒▒▒│                 encode ─┤
                    ▒ client   ▓ backpressure   █ server                                                          tcp-acquire ─┤
                                                                                                                   sse-decode ─┤
                                                                                                                 final-decode ─┤
                                                                                                                 complete-ipc ─┘

  LOADGEN
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  issued                            80,000   completed                         76,483   errors                                12
  issued/s                        13,333.3   completed/s                     12,747.2   tok/s                        5,353,810.0

  latency (ms)                                                 N        avg        min        p50        p99        max
  ttft                                                    76,483      33.00      20.00      32.00      95.00     120.00
  tpot                                                    76,483      11.50       5.00      11.00      45.00      60.00
  e2e                                                     76,483     151.00      90.00     150.00     380.00     410.00

  EVENT LOOP LAG  (ms)
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  fleet p99 1.17 ms   hot workers (p99 ≥ 5 ms)  2/6   tcp conns  3,988
  worker                                                       N        avg        min        p50        p99        max
  main                                                        20       0.49       0.40       0.48       0.57       0.57
  w4                                                          20      11.09      11.00      11.08      11.17      11.17
  w2                                                          20       6.59       6.50       6.58       6.67       6.67
  w0                                                          20       1.29       1.20       1.28       1.37       1.37
  w1                                                          20       0.89       0.80       0.88       0.97       0.97
  …

What it shows

  • REQUEST LIFECYCLE — per-stage N/avg/min/p50/p99/max + heat-graded %E2E across the request path; ipc_2_worker/ipc_2_main mark the cross-process hops.
  • Verdict + e2e bar + cause tree — client / server / backpressure split of E2E; the bar uses only the legend key colors, and the tree under the backpressure value names the worker-loop phases, colored by their stage's %E2E heat.
  • LOADGEN — authoritative counts/rates + ttft/tpot/e2e from the metrics aggregator (perf-window tracked_*).
  • EVENT LOOP LAG — fleet p99, hot-worker count, fleet-wide ESTABLISHED TCP conns (sampled from /proc by the dashboard process — zero producer cost), then main + top-16 workers by max lag.

How it's wired

  • utils/trace.py — lock-free SPSC emitter (~190 ns/event) into a per-process 512 MiB anonymous-mmap ring (pages fault in on demand); per-pid FIFO transport with non-blocking open (bounded retry, raises instead of hanging if the dashboard died) and O_NONBLOCK writes — drops on EAGAIN, adaptive sampling sheds load, and a cumulative self-healing drop counter is re-emitted every tick.
  • utils/trace_dashboard.py (pure aggregation/render, unit-tested in isolation) + scripts/trace_dashboard.py (TUI subprocess) — reads the FIFO via vectorized iter_unpack plus a dedicated ZMQ SUB on the aggregator PUB (sidecar fallback). teardown() writes the terminal loadgen snapshot to the sidecar before closing the FIFO, so the reader exits on true EOF with an authoritative closing frame — no idle/time caps.
  • Warmup is excluded everywhere via the PERF_START reset (lifecycle, rates, loop lag); PERF_END freezes the lifecycle/verdict/tree consistently.

Perf impact (B200 bare metal)

3-config e2e roofline (#328 recipe: max_throughput_server, concurrency 4000, 10 s, 3 reps, mean ± pstdev QPS) on an exclusive bare-metal B200 host (Xeon Platinum 8568Y+, 192 threads):

mode main PR trace-off PR -vvv
nonstream 66,176 ± 971 66,921 ± 875 (+1.1%) 63,211 ± 926 (−4.5% vs main)
stream 51,436 ± 1,306 50,296 ± 730 (−2.2%) 47,428 ± 289 (−7.8% vs main)

Trace-off is within run-to-run noise of main — no overhead when -vvv isn't used. -vvv costs ~5–8% at the sub-millisecond stub roofline, the worst case by construction; against real LLM endpoints (server-dominated e2e) the relative overhead shrinks accordingly.

Test plan

  • uv run pytest tests/unit/utils/ — 112 tests (counts, folds, drop self-heal, loop lag, warmup exclusion, verdict/bar/tree, freeze, tcp gauge)
  • uv run pre-commit run --all-files — green
  • e2e smoke vs max_throughput_server (streaming + offline), plus the 3-config roofline above
  • Reviewer: inference-endpoint -vvv benchmark ... renders cleanly

🤖 Generated with Claude Code

@viraatc viraatc requested a review from a team June 4, 2026 00:03
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@github-actions github-actions Bot requested review from arekay-nv and nvzhihanj June 4, 2026 00:03

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a high-performance, binary event-tracing framework activated via the -vvv CLI flag, complete with a live terminal dashboard, ablation scripts, and comprehensive unit tests. Tracing is integrated across the benchmark executor, HTTP client, worker processes, and load generator. The reviewer's feedback focuses on critical thread-safety improvements to the Dashboard class using a reentrant lock (threading.RLock) to prevent race conditions between the reader and render threads. Additionally, the reviewer suggests key performance optimizations in the hot path, such as using a cached memoryview in _TraceEmitter for zero-copy writes and pre-computing the integer sid in InFlightRequest to avoid repetitive string parsing.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/inference_endpoint/utils/trace_dashboard.py
Comment thread src/inference_endpoint/utils/trace_dashboard.py
Comment thread src/inference_endpoint/utils/trace_dashboard.py
Comment thread src/inference_endpoint/utils/trace_dashboard.py Outdated
Comment thread src/inference_endpoint/utils/trace.py
Comment thread src/inference_endpoint/utils/trace.py Outdated
Comment thread src/inference_endpoint/utils/trace.py Outdated
Comment thread src/inference_endpoint/endpoint_client/http.py Outdated
Comment thread src/inference_endpoint/endpoint_client/http.py
Comment thread scripts/ablate_fd_write.py Fixed
Comment thread scripts/ablate_fd_write.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread scripts/trace_dashboard.py Fixed
Comment thread src/inference_endpoint/utils/trace_dashboard.py Fixed
Comment thread scripts/ablate_trace_emit.py Fixed
Comment thread scripts/ablate_trace_emit.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
@viraatc viraatc changed the title feat(trace): -vvv binary trace events + live dashboard feat: add -vvv trace mode + live dashboard Jun 4, 2026
@viraatc viraatc changed the title feat: add -vvv trace mode + live dashboard WIP: feat: add -vvv trace mode dashboard for client overhead Jun 4, 2026
@viraatc viraatc force-pushed the feat/trace-events branch from f8bd193 to 4dbbeef Compare June 4, 2026 02:30
Comment thread src/inference_endpoint/utils/trace.py Fixed
@viraatc viraatc force-pushed the feat/trace-events branch from 4dbbeef to 02c069f Compare June 4, 2026 02:36
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
@viraatc viraatc force-pushed the feat/trace-events branch 2 times, most recently from d5da7d0 to 9d2dd89 Compare June 4, 2026 02:58
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
@viraatc viraatc force-pushed the feat/trace-events branch from 9d2dd89 to 5ac6ec0 Compare June 4, 2026 03:24
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
@viraatc viraatc force-pushed the feat/trace-events branch from 5ac6ec0 to a81ebda Compare June 4, 2026 03:40
Comment thread scripts/trace_dashboard.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
@viraatc viraatc force-pushed the feat/trace-events branch 5 times, most recently from 4b81bca to 4eb16bc Compare June 4, 2026 04:34
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
@viraatc viraatc force-pushed the feat/trace-events branch from 4eb16bc to 58b4739 Compare June 4, 2026 04:44
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
@viraatc viraatc force-pushed the feat/trace-events branch from 58b4739 to 2a27b50 Compare June 4, 2026 05:04
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
@viraatc viraatc force-pushed the feat/trace-events branch from 8ca09d5 to 2eb43e9 Compare June 4, 2026 23:36
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread scripts/trace_dashboard.py Fixed
Comment thread scripts/trace_dashboard.py Fixed
Comment thread scripts/trace_dashboard.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
@viraatc viraatc force-pushed the feat/trace-events branch 5 times, most recently from 16353c9 to 9c1a920 Compare June 5, 2026 00:24
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread tests/unit/utils/test_trace.py Fixed
Comment thread tests/unit/utils/test_trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
@viraatc viraatc force-pushed the feat/trace-events branch from 9c1a920 to 5b452af Compare June 5, 2026 23:15
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread scripts/trace_dashboard.py Fixed
Comment thread tests/unit/utils/test_trace.py Fixed
Comment thread tests/unit/utils/test_trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread scripts/trace_dashboard.py Fixed
Comment thread scripts/trace_dashboard.py Fixed
Comment thread tests/unit/utils/test_trace.py Fixed
Comment thread tests/unit/utils/test_trace.py Fixed
@viraatc viraatc force-pushed the feat/trace-events branch 5 times, most recently from 0f2c8fd to d0169fd Compare June 5, 2026 23:36
@viraatc viraatc changed the title WIP: feat: add -vvv trace mode dashboard for client overhead feat: add -vvv trace mode dashboard for client overhead Jun 5, 2026
viraatc added a commit that referenced this pull request Jun 5, 2026
This reverts commit 28f5ac1.

The -vvv trace feature was merged to main before review. Reverting so main stays clean; it will be re-introduced via the reviewed feat/trace-events PR (#334).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
viraatc added a commit that referenced this pull request Jun 6, 2026
This reverts commit 28f5ac1.

The -vvv trace feature was merged to main before review; reverting so main stays clean. It will be re-introduced via the reviewed feat/trace-events PR (#334).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@viraatc viraatc force-pushed the feat/trace-events branch from d0169fd to e9a853a Compare June 6, 2026 00:31
@viraatc viraatc changed the base branch from main to revert/trace-from-main June 6, 2026 00:31
Base automatically changed from revert/trace-from-main to main June 6, 2026 00:46
@viraatc viraatc force-pushed the feat/trace-events branch from e9a853a to aff40be Compare June 8, 2026 20:26
Comment thread scripts/trace_dashboard.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread tests/unit/utils/test_trace.py Fixed
Comment thread tests/unit/utils/test_trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread src/inference_endpoint/utils/trace.py Fixed
Comment thread tests/unit/utils/test_trace.py
Comment thread tests/unit/utils/test_trace.py
Comment thread src/inference_endpoint/utils/trace.py
Comment thread src/inference_endpoint/utils/trace.py
Comment thread src/inference_endpoint/utils/trace.py
Comment thread src/inference_endpoint/utils/trace.py
Comment thread src/inference_endpoint/utils/trace.py
Comment thread src/inference_endpoint/utils/trace.py
Opt-in (`-vvv`) per-request lifecycle tracing for the benchmark client,
rendered as a live `rich` dashboard alongside the run. Off by default with
no measurable overhead when off (emit is a no-op binding); the worker hot
path stays lock-free and allocation-free.

Pipeline:
- utils/trace.py — lock-free SPSC ring emitter (~190 ns/event) into a
  per-process 512 MiB anonymous mmap (pages fault in on write). Per-pid
  POSIX FIFO transport: non-blocking open with bounded retry (raises rather
  than hanging if the dashboard died), O_NONBLOCK writes that drop on EAGAIN
  with an adaptive sampler + cumulative self-healing drop counter. bootstrap
  spawns the dashboard, tracks its Popen, and cleanup() reaps it (kills a
  wedged one as a backstop). 17-byte <BQQ frames.
- utils/trace_dashboard.py — pure aggregation + render (unit-tested in
  isolation): lifecycle fold into HDR-histogram stages, heat-graded %E2E
  table, client/server/backpressure verdict, e2e bar, backpressure cause
  tree (pickup-ipc heats independently; encode+tcp-acquire fused), per-proc
  loop-lag panel, and an ESTABLISHED tcp-conn gauge. Cross-process deltas
  floored at 0.
- scripts/trace_dashboard.py — TUI subprocess: FIFO reader thread, ZMQ SUB
  to the aggregator PUB (sidecar fallback), and an off-render-thread /proc
  tcp-conn sampler. Reads to true FIFO EOF with an authoritative final frame.
- Worker / session / agentic-inference emit sites; warmup excluded via a
  PERF_START reset; PERF_END freezes the lifecycle/verdict/tree.

Perf (B200 bare metal, #328 roofline, 3 reps): trace-off within run-to-run
noise of main; -vvv ~5-8% at the sub-ms stub ceiling (worst case).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant