xc-llm-ascend/ut at c87a77e8b4f1b435d8ec32af3b0c729e1cdb511d - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

1092626063 c87a77e8b4 [cherry-pick][refactor]support gatingtopk operator generalization (#4050 )

### What this PR does / why we need it?
pick from : https://github.com/vllm-project/vllm-ascend/pull/2958
Past：
npu_moe_gating_top_k can only support 'group_count=256' pattern

Now：
1、npu_moe_gating_top_k support all size of group_count
2、the functionality of `torch_npu.npu_moe_gating_top_k_softmax` are
included in `torch_npu.npu_moe_gating_top_k`

CANN: depends on 8.3.RC1

Performance：
1. GLM4.5-w8a8, TPS improve 6%
2. Qwen3, the same as before


Signed-off-by: 1092626063 <1092626063@qq.com>

2025-11-19 10:39:28 +08:00

..

[0.11.0] [Cherry-pick #4058 ] Fixes Qwen3-Next enable nz accuracy problem (#4056 )

2025-11-10 20:56:39 +08:00

[Test]Add unit test for compilation/acl_graph.py (#3039 )

2025-09-19 21:31:17 +08:00

[UT] fix skip ut test and enable ut test run normally (#3410 )

2025-10-20 16:30:57 +08:00

device_allocator

add ut for device allocator/camem and mutistream/layers (#2037 )

2025-07-31 19:17:27 +08:00

[Bugfix] TP size larger than KV cache head causes accuracy issues (#3366 )

2025-10-11 11:22:23 +08:00

[CI]Add EPLB CI. (#3568 )

2025-10-21 22:58:02 +08:00

[CI] Add unit test framework (#1201 )

2025-06-16 18:32:28 +08:00

[P/D][BugFix][v0.11.0-dev]Fix proxy format processing errors & Layerwise connector performance optimization (#4069 )

2025-11-09 09:55:10 +08:00

[0.11.0] [Cherry-pick #4058 ] Fixes Qwen3-Next enable nz accuracy problem (#4056 )

2025-11-10 20:56:39 +08:00

add ut for device allocator/camem and mutistream/layers (#2037 )

2025-07-31 19:17:27 +08:00

[cherry-pick]Upgrade CANN to 8.3.rc1 (#3945 ) (#3962 )

2025-11-06 09:05:08 +08:00

patch/worker/patch_common

[Refactor] refactor patch module (#3555 )

2025-10-21 20:19:46 +08:00

[cherry-pick][refactor]support gatingtopk operator generalization (#4050 )

2025-11-19 10:39:28 +08:00

[main] add pd transfer for ascend scheduler (#2753 )

2025-09-10 08:46:39 +08:00

[0.11.0][BugFix] Improve the performance of prefixcache features (#4021 )

2025-11-10 11:51:34 +08:00

[0.11.0] [Cherry-pick #4058 ] Fixes Qwen3-Next enable nz accuracy problem (#4056 )

2025-11-10 20:56:39 +08:00

__init__.py

[2/4][Refactor] Refactor torchair utils (#1892 )

2025-07-21 19:43:30 +08:00

base.py

[Feature]: implement the fusion of allreduce and matmul in prefill phase when tp is enabled (#1926 )

2025-07-28 15:13:37 +08:00

conftest.py

[1/N][CustomOp] Register activation customop instead of overwrite forward_oot (#1841 )

2025-07-18 23:07:14 +08:00

test_ascend_config.py

[Feature] Support moe multi-stream for aclgraph. (#2946 )

2025-09-19 11:06:45 +08:00

test_envs.py

[Misc] Remove redundant imported envs, using envs_ascend instead (#2193 )

2025-08-14 09:33:39 +08:00

test_platform.py

[V0.11.0][Core] Restore scheduling logic under default configuration (#4094 )

2025-11-10 20:02:23 +08:00

test_utils.py

[0.11.0] [Cherry-pick #4058 ] Fixes Qwen3-Next enable nz accuracy problem (#4056 )

2025-11-10 20:56:39 +08:00