xc-llm-ascend

Files

florenceCH 14497b748d Remove qwen3 moe MC2 cumsum & cast (#3126 )

What this PR does / why we need it?
The Qwen3 moe MC2 graph currently has two redundant computational
operator implementations. After npu_moe_distribute_dispatch_v2, the
cumsum and cast operations have been added. By using
expert_token_nums_type=0 and not converting weight_scale to float32,
these two operators can be eliminated, thereby improving inference
performance.

Does this PR introduce any user-facing change?
No

How was this patch tested?
No need

vLLM version: v0.10.2
vLLM main:
f225ea7dd9

- vLLM version: v0.10.2
- vLLM main:
f225ea7dd9

---------

Signed-off-by: florenceCH <gaoxiang120@huawei.com>
Co-authored-by: florenceCH <gaoxiang120@huawei.com>

2025-09-26 08:51:30 +08:00

e2e

[bugFix] Correct the vllm interface e2e test Base container image name (#3179 )

2025-09-25 16:03:09 +08:00

Remove qwen3 moe MC2 cumsum & cast (#3126 )

2025-09-26 08:51:30 +08:00

__init__.py

[SpecDecode] Add spec decode support (#500 )

2025-04-17 20:16:32 +08:00