xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

Commit Graph

Author	SHA1	Message	Date
jiangmengyu18	3cbd6acc89	[v0.18.0][Feature] Support Flash Comm V1 for Qwen3-VL models (#7893 ) ### What this PR does / why we need it? Enable Flash Comm V1 (sequence parallelism) for Qwen3-VL models (both dense and MoE variants). Root cause: Qwen3-VL's deepstack embeddings remain full-size [N, H] while hidden states become [N/tp_size, H] after reduce-scatter, causing shape mismatch on add. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - [x] Run Qwen3-VL dense model with FC1 enabled (TP > 1), verify correct output - [x] Run Qwen3-VL MoE model with FC1 enabled (TP > 1), verify correct output --------- Signed-off-by: betta18 <jiangmengyu1@huawei.com> Signed-off-by: jiangmengyu18 <56633611+jiangmengyu18@users.noreply.github.com> Co-authored-by: betta18 <jiangmengyu1@huawei.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-04-03 11:38:41 +08:00
jiangmengyu18	85234d096d	[v0.18.0][Feature] support qkv_rmsnorm_mrope for qwen3vl (#7852 ) ### What this PR does / why we need it? Qwen3vl full attention supports enabling the split_qkv_rmsnorm_mrope fusion operator. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - [x] Run Qwen3-VL dense model with the fusion operator, verify correct output - [x] Run Qwen3-VL MoE model with the fusion operator, verify correct output --------- Signed-off-by: jiangmengyu18 <451528648@qq.com> Signed-off-by: jiangmengyu18 <56633611+jiangmengyu18@users.noreply.github.com> Signed-off-by: betta18 <jiangmengyu1@huawei.com> Co-authored-by: betta18 <jiangmengyu1@huawei.com>	2026-04-02 17:46:50 +08:00

Author

SHA1

Message

Date

jiangmengyu18

3cbd6acc89

[v0.18.0][Feature] Support Flash Comm V1 for Qwen3-VL models (#7893 )

### What this PR does / why we need it?
Enable Flash Comm V1 (sequence parallelism) for Qwen3-VL models (both
dense and MoE variants).

Root cause: Qwen3-VL's deepstack embeddings remain full-size [N, H]
while hidden states become [N/tp_size, H] after reduce-scatter, causing
shape mismatch on add.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- [x] Run Qwen3-VL dense model with FC1 enabled (TP > 1), verify correct
output
- [x] Run Qwen3-VL MoE model with FC1 enabled (TP > 1), verify correct
output

---------

Signed-off-by: betta18 <jiangmengyu1@huawei.com>
Signed-off-by: jiangmengyu18 <56633611+jiangmengyu18@users.noreply.github.com>
Co-authored-by: betta18 <jiangmengyu1@huawei.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

2026-04-03 11:38:41 +08:00

jiangmengyu18

85234d096d

[v0.18.0][Feature] support qkv_rmsnorm_mrope for qwen3vl (#7852 )

### What this PR does / why we need it?
Qwen3vl full attention supports enabling the split_qkv_rmsnorm_mrope
fusion operator.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
- [x] Run Qwen3-VL dense model with the fusion operator, verify correct
output
- [x] Run Qwen3-VL MoE model with the fusion operator, verify correct
output

---------

Signed-off-by: jiangmengyu18 <451528648@qq.com>
Signed-off-by: jiangmengyu18 <56633611+jiangmengyu18@users.noreply.github.com>
Signed-off-by: betta18 <jiangmengyu1@huawei.com>
Co-authored-by: betta18 <jiangmengyu1@huawei.com>

2026-04-02 17:46:50 +08:00

2 Commits