xc-llm-ascend/worker at 0d773efd70e910144020756ea08d1c205c25f6f1 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

jiangmengyu18 3cbd6acc89 [v0.18.0][Feature] Support Flash Comm V1 for Qwen3-VL models (#7893 )

### What this PR does / why we need it?
Enable Flash Comm V1 (sequence parallelism) for Qwen3-VL models (both
dense and MoE variants).

Root cause: Qwen3-VL's deepstack embeddings remain full-size [N, H]
while hidden states become [N/tp_size, H] after reduce-scatter, causing
shape mismatch on add.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- [x] Run Qwen3-VL dense model with FC1 enabled (TP > 1), verify correct
output
- [x] Run Qwen3-VL MoE model with FC1 enabled (TP > 1), verify correct
output

---------

Signed-off-by: betta18 <jiangmengyu1@huawei.com>
Signed-off-by: jiangmengyu18 <56633611+jiangmengyu18@users.noreply.github.com>
Co-authored-by: betta18 <jiangmengyu1@huawei.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

2026-04-03 11:38:41 +08:00

..

adapt to main2main for model runner v2 (#7578 )

2026-03-25 09:08:44 +08:00

__init__.py

[v0.18.0][Feature] support qkv_rmsnorm_mrope for qwen3vl (#7852 )

2026-04-02 17:46:50 +08:00

patch_bert.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_cudagraph.py

[main][bugfix] Fixed the problem of speculative decoding in FULL mode (#7148 )

2026-03-12 14:51:12 +08:00

patch_deepencoder2.py

[Performance]Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder (#7737 )

2026-03-31 14:49:29 +08:00

patch_deepseek_mtp.py

[MTP][Bugfix] Fix GLM5-W8A8 precision issues caused by rotary quant MTP weights (#7139 )

2026-03-12 20:01:24 +08:00

patch_distributed.py

[Feat][SP] Suport SP for VL MoE models (#7044 )

2026-03-24 17:16:00 +08:00

patch_draft_quarot.py

[main][feature] Support quarot for eagle3 without embedding (#7038 )

2026-03-09 10:43:06 +08:00

patch_gdn_attn.py

[Feature] Optimize Qwen3.5/Qwen3Next GDN prefill by prebuilding chunk metadata (#7487 )

2026-03-22 23:09:23 +08:00

patch_huanyuan_vl.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_kimi_k25.py

[Bugfix] Fix get_rope_shape for Kimi-K2.5 (#7521 )

2026-03-22 21:06:31 +08:00

patch_mamba_utils.py

[Hybrid] support prefix cache for Qwen3.5/Next with --mamba-cache-mode align (#7103 )

2026-03-15 09:44:09 +08:00

patch_minimax_m2_linear_attn.py

[Model] Support Minimax-m2.5 on NPU (#7105 )

2026-03-11 00:12:02 +08:00

patch_minimax_m2.py

[v0.18.0] Apply Eagle3 to MiniMax-M2.5 (#7619 ) (#7714 )

2026-03-27 18:33:29 +08:00

patch_module.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_multimodal_merge.py

[bugfix]Fix parameter ordering bug in _merge_multimodal_embeddings (#7068 )

2026-03-09 16:05:52 +08:00

patch_npugraph_ex_triton.py

[npugraph_ex]enable npugraph_ex by default (#6664 )

2026-02-12 08:44:06 +08:00

patch_qwen3_5.py

Qwen3.5 MoE supports flashcomm v1 (#7644 )

2026-03-25 23:09:33 +08:00

patch_qwen3_next_mtp.py

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

patch_qwen3_next.py

[Ops][Misc] Refactor and optimize CausalConv1d for Ascend (#7495 )

2026-03-24 00:07:12 +08:00

patch_qwen3vl.py

[v0.18.0][Feature] Support Flash Comm V1 for Qwen3-VL models (#7893 )

2026-04-03 11:38:41 +08:00

patch_rejection_sampler.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_routed_experts_capturer.py

[Feat] Support routing replay (#6696 )

2026-02-26 10:22:47 +08:00

patch_triton.py

adapt to main2main for model runner v2 (#7578 )

2026-03-25 09:08:44 +08:00

patch_unquantized_gemm.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_weight_utils.py

[Bugfix]Fix deepseek 3.2 C8 precision by rotary tensor (#7537 )

2026-03-25 09:18:00 +08:00