xc-llm-ascend/vllm_ascend at 9c6d0b422c6d24e3d7e895d7a5cf2d5cc5c313d1 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Angazenn 9c6d0b422c [v0.11.0-dev][misc]change default capture size for Qwen3-MoE when using full dp (#4205 )

### What this PR does / why we need it?
This dev version of #4199 .
Currently, the default `cudagraph_capture_size` in vLLM is `[1, 2, 4 ,8
,16 ,24 ,... , max_capture_size]`. However, this is not always the best
choice on different situations. This PR aims to change the default
setting when running Qwen3-MoE on full dp (`dp_size > 1` && `tp_size ==
1`) setting, which is usually applied in Large-Scale EP.
old :
`[1, 2, 4 ,8 ,16 ,24 ,... , max_capture_size]`
new:
`[1, 2, 5 ,10 ,15, 16 ,24 ,... , max_capture_size]`
This is mainly because the performance of `_npu_paged_attention` op
degrades dramatically on old settings. We hope to provide better
performance if users do not set specific `cudagraph_capture_size`.
### Does this PR introduce _any_ user-facing change?
The default `cudagraph_capture_size` is modified in above cases.
However, if `cudagraph_capture_size` has already set by users, this PR
won't have any influence on this.

### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

---------

Signed-off-by: Angazenn <supperccell@163.com>

2025-11-21 11:19:11 +08:00

..

[Cherry-pick][0.11.0] Adapted to torch_npu.npu_fused_infer_attention_score (#4202 )

2025-11-17 10:56:23 +08:00

[Bugfix][Aclgraph] failed to update graph task (#4282 )

2025-11-19 21:30:48 +08:00

[BugFix][Cherry-pick] Cherry-pick PR 3675 to v0.11.0-dev (#3732 )

2025-10-25 09:41:51 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

[Cherry-pick] [0.11.0] pd proxy support ipv6 and fix proxy (#4242 )

2025-11-18 16:33:00 +08:00

[CI]Add EPLB CI. (#3568 )

2025-10-21 22:58:02 +08:00

[Bugfix][LoRA] Fix forward error and shape mismatch when using LoRA (#3153 )

2025-09-28 17:30:50 +08:00

[0.11.0][Perf] Add padding vision tower for Qwen2_5_Omni (#4041 )

2025-11-08 13:56:05 +08:00

[Quickfix] update CachedRequestState as NewRequestData changed (#2367 )

2025-08-15 07:35:27 +08:00

cherry pick from pr 4270 (#4285 )

2025-11-19 22:32:02 +08:00

[Cherry-pick][0.11.0] Adapted to torch_npu.npu_fused_infer_attention_score (#4202 )

2025-11-17 10:56:23 +08:00

[v0.11.0-dev][Bugfix][cherry-pick]bugfix for weight load of kimi-k2 (#4190 )

2025-11-14 15:43:22 +08:00

Drop 0.10.2 (#3284 )

2025-10-09 10:28:38 +08:00

[0.11.0][Bugfix] Fix ngram precision issue and open e2e ngram test (#4092 )

2025-11-11 09:58:03 +08:00

[0.11.0][BugFix] Improve the performance of prefixcache features (#4021 )

2025-11-10 11:51:34 +08:00

[cherry-pick][v0.11.0-dev][bugfix] Change seq_lens in dummy attn_metadata to max_query_len (#4099 )

2025-11-12 20:32:50 +08:00

__init__.py

[Refactor] Adapt deepseek-v3.2 to vllm 0.11.0 (#3432 )

2025-10-15 17:48:58 +08:00

ascend_config.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

ascend_forward_context.py

[cherry-pick][refactor]support gatingtopk operator generalization (#4050 )

2025-11-19 10:39:28 +08:00

cpu_binding.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

envs.py

[Cherry-pick] [0.11.0] pd proxy support ipv6 and fix proxy (#4242 )

2025-11-18 16:33:00 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

[v0.11.0-dev][misc]change default capture size for Qwen3-MoE when using full dp (#4205 )

2025-11-21 11:19:11 +08:00

utils.py

[v0.11.0-dev][misc]change default capture size for Qwen3-MoE when using full dp (#4205 )

2025-11-21 11:19:11 +08:00