chore: add qwen perf yaml#369
Conversation
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a new benchmark configuration file, qwen_agentic_benchmark.yaml, for Qwen agentic inference. The review feedback highlights missing configuration parameters required to comply with benchmark invariants, specifically recommending the addition of num_trajectories_to_issue and stop_issuing_on_first_user_complete under agentic_inference, as well as the settings.client configuration block.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| agentic_inference: | ||
| enable_salt: true # do not change. | ||
| inject_tool_delay: true # do not change. |
There was a problem hiding this comment.
To ensure compliance with the benchmark invariants and to make the configuration complete, please explicitly specify num_trajectories_to_issue and stop_issuing_on_first_user_complete under agentic_inference.
agentic_inference:
enable_salt: true # do not change.
inject_tool_delay: true # do not change.
num_trajectories_to_issue: 990 # Should be integer multiple of dataset trajectory count.
stop_issuing_on_first_user_complete: false # required benchmark default.| load_pattern: | ||
| type: agentic_inference | ||
| target_concurrency: 8 # Submission-specific concurrency. |
There was a problem hiding this comment.
The settings.client configuration is missing. For official agentic benchmark runs, the client settings warmup_connections: 0 and max_idle_time: 0.5 are required invariants to ensure consistent and comparable performance results.
load_pattern:
type: agentic_inference
target_concurrency: 8 # Submission-specific concurrency.
client:
warmup_connections: 0
max_idle_time: 0.5There was a problem hiding this comment.
Pull request overview
Adds a runnable benchmark configuration YAML under examples/10_Agentic_Inference/ for running an online performance benchmark of Qwen/Qwen3.6-35B-A3B using the agentic inference load pattern.
Changes:
- Add
qwen_agentic_benchmark.yamlexample config for agentic inference performance runs.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| load_pattern: | ||
| type: agentic_inference | ||
| target_concurrency: 8 # Submission-specific concurrency. | ||
|
|
| agentic_inference: | ||
| enable_salt: true # do not change. | ||
| inject_tool_delay: true # do not change. | ||
|
|
What does this PR do?
Add performance run .yaml file for Qwen3.6-35B-A3B for agentic inference
Type of change
Related issues
Testing
Checklist