xc-llm-ascend

Files

whx b6a7f07c70 [Perf][MoE] Improve MoE multistream parallel performace. (#1891 )

This PR designs the shared expert multi-stream parallelism of
w8a8-dynamic-quantized MoE stage in more detail to achieve better
performance.

- vLLM version: v0.10.0
- vLLM main:
2cc571199b

Signed-off-by: whx-sjtu <2952154980@qq.com>

2025-07-29 23:53:19 +08:00

__init__.py

[aclgraph] implentment NPUPiecewiseBackend to enable aclgraph (#836 )

2025-05-29 11:58:26 +08:00

activation.py

[1/N][CustomOp] Register activation customop instead of overwrite forward_oot (#1841 )

2025-07-18 23:07:14 +08:00

attention.py

Disaggregate prefill for kv cache register style (#950 )

2025-07-26 17:15:47 +08:00

cache.py

port deepseekv2 and mtp to main branch (#429 )

2025-04-19 17:38:18 +08:00

common_fused_moe.py

[Dist][EP] Remove ETP/EP maintained in vllm-ascend (#1681 )

2025-07-21 09:08:04 +08:00

expert_load_balancer.py

Add static EPLB (#1116 )

2025-06-09 19:28:11 +08:00

fused_moe.py

[Perf][MoE] Improve MoE multistream parallel performace. (#1891 )

2025-07-29 23:53:19 +08:00

layernorm.py

[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance (#1806 )

2025-07-22 19:03:13 +08:00

rotary_embedding.py

[CORE]initial support for torchair with non-mla backend (#1506 )

2025-07-03 22:21:42 +08:00

vocab_parallel_embedding.py

[1/N][CI] Move linting system to pre-commits hooks (#1256 )

2025-07-10 14:17:15 +08:00