xc-llm-ascend/ut at 50e7934415ae87233cb1bf3c6e81490c7f35e921 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

pichangping 50e7934415 MLA prefill preformance optimization (#5456 )

### What this PR does / why we need it?
Since the _npu_ring_mla operator deteriorates in long-sequencescenarios,
the long sequence is split into shorter sequences for input to improve
performance.

- vLLM version: v0.13.0
- vLLM main:
5326c89803

---------

Signed-off-by: pichangping <1337510399@qq.com>

2026-01-05 11:41:59 +08:00

..

MLA prefill preformance optimization (#5456 )

2026-01-05 11:41:59 +08:00

[Graph][Fusion] Add AddRMSNorm(with bias) (#5491 )

2025-12-31 17:10:26 +08:00

Remove ascend schuduler ut (#4684 )

2025-12-04 14:10:28 +08:00

device_allocator

add ut for device allocator/camem and mutistream/layers (#2037 )

2025-07-31 19:17:27 +08:00

[Feat] Support MLP_TP feature, exclude MOE layer (#4999 )

2025-12-18 20:06:53 +08:00

[EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr5285） (#5311 )

2025-12-29 09:26:14 +08:00

[CI] Add unit test framework (#1201 )

2025-06-16 18:32:28 +08:00

[refactor](UT,PCP,DCP) refactor pcp&dcp patches in UTs (#5505 )

2026-01-05 09:05:45 +08:00

model_loader/netloader

Revert "moe_gating_top_k" (#5512 )

2025-12-30 15:05:47 +08:00

[Refactor] Formatting output types related to FuseMoE (#5481 )

2025-12-31 14:24:37 +08:00

patch/worker/patch_common

[Refactor] refactor patch module (#3555 )

2025-10-21 20:19:46 +08:00

[quantization] Add w8a16 quantization support (#4541 )

2025-12-24 19:49:32 +08:00

[Refactor][Triton] Move reject sample triton kernels into ops/triton (#5324 )

2025-12-29 16:15:41 +08:00

[Feature] Refactor PCP &DCP related code (#5214 )

2025-12-31 09:29:57 +08:00

MLA prefill preformance optimization (#5456 )

2026-01-05 11:41:59 +08:00

__init__.py

[2/4][Refactor] Refactor torchair utils (#1892 )

2025-07-21 19:43:30 +08:00

base.py

[Feature]: implement the fusion of allreduce and matmul in prefill phase when tp is enabled (#1926 )

2025-07-28 15:13:37 +08:00

conftest.py

[CI] Add Triton Ascend in CI (#4921 )

2025-12-23 12:47:35 +08:00

test_ascend_config.py

[Feature] Support kv nz feature for DeepSeek decode node in disagg-prefill scenario (#3072 )

2025-12-31 14:24:04 +08:00

test_envs.py

[Misc] Remove redundant imported envs, using envs_ascend instead (#2193 )

2025-08-14 09:33:39 +08:00

test_platform.py

Drop 0.12.0 support (#5146 )

2025-12-20 09:38:53 +08:00

test_utils.py

[refactor] refactor weight trans nz and transpose (#4878 )

2025-12-19 14:27:24 +08:00