Skip to content

chore: add qwen perf yaml#369

Open
tianmu-li wants to merge 2 commits into
mlcommons:mainfrom
tianmu-li:chore/add_qwen_perf_yaml
Open

chore: add qwen perf yaml#369
tianmu-li wants to merge 2 commits into
mlcommons:mainfrom
tianmu-li:chore/add_qwen_perf_yaml

Conversation

@tianmu-li

Copy link
Copy Markdown
Collaborator

What does this PR do?

Add performance run .yaml file for Qwen3.6-35B-A3B for agentic inference

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
@tianmu-li tianmu-li requested review from a team and Copilot June 22, 2026 18:18
@github-actions

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Signed-off-by: Li, Tianmu <tianmu.li@intel.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new benchmark configuration file, qwen_agentic_benchmark.yaml, for Qwen agentic inference. The review feedback highlights missing configuration parameters required to comply with benchmark invariants, specifically recommending the addition of num_trajectories_to_issue and stop_issuing_on_first_user_complete under agentic_inference, as well as the settings.client configuration block.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +22 to +24
agentic_inference:
enable_salt: true # do not change.
inject_tool_delay: true # do not change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To ensure compliance with the benchmark invariants and to make the configuration complete, please explicitly specify num_trajectories_to_issue and stop_issuing_on_first_user_complete under agentic_inference.

    agentic_inference:
      enable_salt: true # do not change.
      inject_tool_delay: true # do not change.
      num_trajectories_to_issue: 990 # Should be integer multiple of dataset trajectory count.
      stop_issuing_on_first_user_complete: false # required benchmark default.

Comment on lines +31 to +33
load_pattern:
type: agentic_inference
target_concurrency: 8 # Submission-specific concurrency.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The settings.client configuration is missing. For official agentic benchmark runs, the client settings warmup_connections: 0 and max_idle_time: 0.5 are required invariants to ensure consistent and comparable performance results.

  load_pattern:
    type: agentic_inference
    target_concurrency: 8 # Submission-specific concurrency.

  client:
    warmup_connections: 0
    max_idle_time: 0.5

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a runnable benchmark configuration YAML under examples/10_Agentic_Inference/ for running an online performance benchmark of Qwen/Qwen3.6-35B-A3B using the agentic inference load pattern.

Changes:

  • Add qwen_agentic_benchmark.yaml example config for agentic inference performance runs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +31 to +34
load_pattern:
type: agentic_inference
target_concurrency: 8 # Submission-specific concurrency.

Comment on lines +22 to +25
agentic_inference:
enable_salt: true # do not change.
inject_tool_delay: true # do not change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants