Logo
Explore Help
Register Sign In
EngineX/xc-llm-ascend
3
0
Fork 0
You've already forked xc-llm-ascend
Code Issues Pull Requests Actions Projects Releases Wiki Activity
Files
3386e09a40bafe03bf39c7731c68d0a2cb8def20
xc-llm-ascend/vllm_ascend/ops
History
whx b6a7f07c70 [Perf][MoE] Improve MoE multistream parallel performace. (#1891)
This PR designs the shared expert multi-stream parallelism of
w8a8-dynamic-quantized MoE stage in more detail to achieve better
performance.

- vLLM version: v0.10.0
- vLLM main:
2cc571199b

Signed-off-by: whx-sjtu <2952154980@qq.com>
2025-07-29 23:53:19 +08:00
..
__init__.py
[aclgraph] implentment NPUPiecewiseBackend to enable aclgraph (#836)
2025-05-29 11:58:26 +08:00
activation.py
[1/N][CustomOp] Register activation customop instead of overwrite forward_oot (#1841)
2025-07-18 23:07:14 +08:00
attention.py
Disaggregate prefill for kv cache register style (#950)
2025-07-26 17:15:47 +08:00
cache.py
port deepseekv2 and mtp to main branch (#429)
2025-04-19 17:38:18 +08:00
common_fused_moe.py
[Dist][EP] Remove ETP/EP maintained in vllm-ascend (#1681)
2025-07-21 09:08:04 +08:00
expert_load_balancer.py
Add static EPLB (#1116)
2025-06-09 19:28:11 +08:00
fused_moe.py
[Perf][MoE] Improve MoE multistream parallel performace. (#1891)
2025-07-29 23:53:19 +08:00
layernorm.py
[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance (#1806)
2025-07-22 19:03:13 +08:00
rotary_embedding.py
[CORE]initial support for torchair with non-mla backend (#1506)
2025-07-03 22:21:42 +08:00
vocab_parallel_embedding.py
[1/N][CI] Move linting system to pre-commits hooks (#1256)
2025-07-10 14:17:15 +08:00
Powered by Gitea Version: 1.24.3 Page: 88ms Template: 6ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API