xc-llm-ascend

Author	SHA1	Message	Date
Angazenn	b84465c525	[Perf]Enable npu_moe_gating_top_k_softmax on quantized scenarios (#2633 ) ### What this PR does / why we need it? This PR enables `npu_moe_gating_top_k_softmax` when running quantized MoE (such as W8A8). This op in fact makes no distinction between quantized and non-quantized scenarios. Introducing this op reduces 3~4ms for TPOT. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: `ce30dca5c4` Signed-off-by: Angazenn <supperccell@163.com>	2025-09-03 09:14:17 +08:00
leo-pony	0df059f41a	[CI] Fix CI Break: upstream adds routed_scaling_factor in forward_oot interface (#2675 ) ### What this PR does / why we need it? Fix CI Break: upstream adds routed_scaling_factor in forward_oot interface, vllm-ascend needs to adapt ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? E2E and UT - vLLM version: v0.10.1.1 - vLLM main: `3e330fcb21` Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-09-01 19:02:50 +08:00
s30076806	6a4ec186e7	[Qwen-moe] Remove the minor operation arange (#2373 ) ### What this PR does / why we need it? Integrate the arange operator to reduce the time spent and improve performance ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: `56dcf4e7e9` --------- Signed-off-by: s30076806 <songjiayang2@h-partners.com>	2025-08-27 09:13:31 +08:00
shiyuan680	e14f2ef669	refactor select_experts of moe module (#2150 ) ### What this PR does / why we need it? this pr refactor select_experts of moe module i merge implementations of quantitative and non-quantitative method in a new class use such as vllm like ExpertsSelector.select_experts ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? test in qwen3-moe and all ut. - vLLM version: v0.10.0 - vLLM main: `e18859298d` Signed-off-by: yangcheng <yangcheng104@huawei.com> Co-authored-by: yangcheng (AJ) <y00806874@china.huawei.com>	2025-08-14 11:50:53 +08:00

4 Commits