Logo
Explore Help
Register Sign In
EngineX/xc-llm-ascend
3
0
Fork 0
You've already forked xc-llm-ascend
Code Issues Pull Requests Actions Projects Releases Wiki Activity
Files
9b67c87b1475fe7dc79442efc8f2fcbce7b728cf
xc-llm-ascend/vllm_ascend/ops
History
whx b6a7f07c70 [Perf][MoE] Improve MoE multistream parallel performace. (#1891)
This PR designs the shared expert multi-stream parallelism of
w8a8-dynamic-quantized MoE stage in more detail to achieve better
performance.

- vLLM version: v0.10.0
- vLLM main:
2cc571199b

Signed-off-by: whx-sjtu <2952154980@qq.com>
2025-07-29 23:53:19 +08:00
..
__init__.py
[aclgraph] implentment NPUPiecewiseBackend to enable aclgraph (#836)
2025-05-29 11:58:26 +08:00
activation.py
[1/N][CustomOp] Register activation customop instead of overwrite forward_oot (#1841)
2025-07-18 23:07:14 +08:00
attention.py
Disaggregate prefill for kv cache register style (#950)
2025-07-26 17:15:47 +08:00
cache.py
port deepseekv2 and mtp to main branch (#429)
2025-04-19 17:38:18 +08:00
common_fused_moe.py
[Dist][EP] Remove ETP/EP maintained in vllm-ascend (#1681)
2025-07-21 09:08:04 +08:00
expert_load_balancer.py
Add static EPLB (#1116)
2025-06-09 19:28:11 +08:00
fused_moe.py
[Perf][MoE] Improve MoE multistream parallel performace. (#1891)
2025-07-29 23:53:19 +08:00
layernorm.py
[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance (#1806)
2025-07-22 19:03:13 +08:00
rotary_embedding.py
[CORE]initial support for torchair with non-mla backend (#1506)
2025-07-03 22:21:42 +08:00
vocab_parallel_embedding.py
[1/N][CI] Move linting system to pre-commits hooks (#1256)
2025-07-10 14:17:15 +08:00
Powered by Gitea Version: 1.24.3 Page: 98ms Template: 11ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API