-
-
Notifications
You must be signed in to change notification settings - Fork 17.7k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bugfix] Strip Gemma4 string delimiters from dict keys
bug
Something isn't working
tool-calling
#44756
opened Jun 7, 2026 by
he-yufeng
Contributor
Loading…
4 tasks done
Revert "[Bugfix][MoE] Snapshot max_cudagraph_capture_size into FusedMoEConfig" (#44613)
bug
Something isn't working
nvidia
#44754
opened Jun 7, 2026 by
vllm-agent
Contributor
•
Draft
[Test] Remove Transformers v5 cap for InternLM2VEForCausalLM
#44753
opened Jun 7, 2026 by
Khadija-Bayoud
Loading…
[Bug] Fix gemma4_tool_parser not stripping STRING_DELIM from dict keys
bug
Something isn't working
tool-calling
#44752
opened Jun 6, 2026 by
ShuaoZhang
Loading…
[Bugfix] Propagate ImportError from load_audio_pyav when vllm[audio] …
bug
Something isn't working
multi-modality
Related to multi-modality (#4194)
#44750
opened Jun 6, 2026 by
littlecircle0730
Loading…
3 of 4 tasks
[Misc] Remove orphaned env vars and stale env-var references
documentation
Improvements or additions to documentation
#44749
opened Jun 6, 2026 by
DaoyuanLi2816
Contributor
Loading…
[Cohere] Fix Cohere2MoE weight loading when using Transformers ≥5.10
ready
ONLY add when PR is ready to merge/full CI is needed
#44747
opened Jun 6, 2026 by
Terrencezzj
Contributor
Loading…
4 tasks
[Security] Fix remote DoS via invalid recovered token reinjection
v1
#44744
opened Jun 6, 2026 by
jperezdealgaba
Contributor
Loading…
[Security] Fix remote DoS from grammar-rejected spec tokens padded with -1
v1
#44743
opened Jun 6, 2026 by
jperezdealgaba
Contributor
Loading…
[Bugfix] Harden allowed_token_ids metadata for spec-decode
bug
Something isn't working
v1
#44742
opened Jun 6, 2026 by
jperezdealgaba
Contributor
Loading…
[Bugfix] Gemma4 streaming parser for multi-boundary tool deltas
bug
Something isn't working
tool-calling
#44741
opened Jun 6, 2026 by
yasu-oh
Loading…
4 tasks done
[Bugfix][Model] GraniteMoE: load FP8_DYNAMIC expert weight_scale tensors
bug
Something isn't working
ci/build
#44739
opened Jun 6, 2026 by
javierdejesusda
Contributor
Loading…
[Opt] Optimize rotary embedding cache length
#44738
opened Jun 6, 2026 by
labAxiaoming
Contributor
Loading…
4 tasks
[Bugfix] Canonicalize FP8 weight layout to (K, N) at the source
bug
Something isn't working
quantization
ready
ONLY add when PR is ready to merge/full CI is needed
rocm
Related to AMD ROCm
#44735
opened Jun 6, 2026 by
mgoin
Member
Loading…
3 of 4 tasks
[KV offload] Parallel-agnostic fs-tier cache for single full-attention group
v1
#44733
opened Jun 6, 2026 by
Etelis
Contributor
Loading…
fix(scheduler): eager KV cache prefetch for waiting queue requests
v1
#44731
opened Jun 6, 2026 by
liuyun7345
Loading…
[Rust Frontend]: Add
/tokenize API support with Completion format
rust
#44730
opened Jun 6, 2026 by
coder3101
Loading…
4 tasks
[Bugfix][Rust Frontend] Set a structured-output backend so requests do not 500
bug
Something isn't working
rust
#44729
opened Jun 6, 2026 by
Sunt-ing
Contributor
Loading…
[Kernel] FlashInfer FP8 scaled_mm: restore N-D output shape
nvidia
#44728
opened Jun 6, 2026 by
lishunyang12
Loading…
[Bugfix][Core] Close underlying iterator in merge_async_iterators single-iterator fast path
bug
Something isn't working
#44726
opened Jun 6, 2026 by
Sunt-ing
Contributor
Loading…
[Bugfix][Frontend] Fix Anthropic count_tokens decorator order driving server load negative
bug
Something isn't working
frontend
#44725
opened Jun 6, 2026 by
Sunt-ing
Contributor
Loading…
Previous Next
ProTip!
Adding no:label will show everything without a label.