Files
xc-llm-ascend/vllm_ascend/patch/worker
jiangmengyu18 3cbd6acc89 [v0.18.0][Feature] Support Flash Comm V1 for Qwen3-VL models (#7893)
### What this PR does / why we need it?
Enable Flash Comm V1 (sequence parallelism) for Qwen3-VL models (both
dense and MoE variants).

Root cause: Qwen3-VL's deepstack embeddings remain full-size [N, H]
while hidden states become [N/tp_size, H] after reduce-scatter, causing
shape mismatch on add.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- [x] Run Qwen3-VL dense model with FC1 enabled (TP > 1), verify correct
output
- [x] Run Qwen3-VL MoE model with FC1 enabled (TP > 1), verify correct
output

---------

Signed-off-by: betta18 <jiangmengyu1@huawei.com>
Signed-off-by: jiangmengyu18 <56633611+jiangmengyu18@users.noreply.github.com>
Co-authored-by: betta18 <jiangmengyu1@huawei.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-03 11:38:41 +08:00
..