xc-llm-ascend/singlecard at 6391f0625f08f41a5c8fa4243e971eab2a91535f - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

1092626063 c4a11a745a [refactor]support gatingtopk operator generalization (#4356 )

### What this PR does / why we need it?
This pr is cherry-pick from :
https://github.com/vllm-project/vllm-ascend/pull/2958 and
https://github.com/vllm-project/vllm-ascend/pull/4340

Past：
npu_moe_gating_top_k can only support 'group_count=256' pattern

Now：
1、npu_moe_gating_top_k support all size of group_count
2、the functionality of `torch_npu.npu_moe_gating_top_k_softmax` are
included in `torch_npu.npu_moe_gating_top_k`

CANN: depends on 8.3.RC1

Performance：
1. GLM4.5-w8a8, TPS improve 6%
2. Qwen3, the same as before

---------

Signed-off-by: 1092626063 <1092626063@qq.com>

2025-12-04 20:10:13 +08:00

..

[refactor]support gatingtopk operator generalization (#4356 )

2025-12-04 20:10:13 +08:00

[0.11.0][Bugfix] Fix ngram precision issue and open e2e ngram test (#4092 )

2025-11-11 09:58:03 +08:00

__init__.py

[CI] Add unit test framework (#1201 )

2025-06-16 18:32:28 +08:00

test_aclgraph_mem.py

add new e2e tests case for aclgraph memory to v0.11.0 (#3880 )

2025-10-31 09:17:09 +08:00

test_aclgraph.py

[Test] Temporarily skip flaky ACL graph test (#3577 )

2025-10-21 17:16:15 +08:00

test_ascend_scheduler.py

ACLgraph enable: Test cases revisions for all features (#3388 )

2025-10-17 17:15:19 +08:00

test_bge_model.py

[Feat] Supports Aclgraph for bge-m3 (#3171 )

2025-10-14 23:07:45 +08:00

test_camem.py

[v0.11.0][Bugfix] fix sleepmode level2 e2e test (#4023 )

2025-11-08 14:11:15 +08:00

test_chunked.py

ACLgraph enable: Test cases revisions for all features (#3388 )

2025-10-17 17:15:19 +08:00

test_embedding_aclgraph.py

[Feat] Supports Aclgraph for bge-m3 (#3171 )

2025-10-14 23:07:45 +08:00

test_embedding.py

ACLgraph enable: Test cases revisions for all features (#3388 )

2025-10-17 17:15:19 +08:00

test_guided_decoding.py

[Misc] Clean up useless patch (#3320 )

2025-10-09 14:07:26 +08:00

test_ilama_lora.py

ACLgraph enable: Test cases revisions for all features (#3388 )

2025-10-17 17:15:19 +08:00

test_multistream_overlap_shared_expert.py

[Cherry-pick] Port MoE multi-stream fix to v0.11.0-dev (#3753 )

2025-10-25 15:51:43 +08:00

test_profile_execute_duration.py

Refactor e2e CI (#2276 )

2025-09-02 09:02:22 +08:00

test_quantization.py

ACLgraph enable: Test cases revisions for all features (#3388 )

2025-10-17 17:15:19 +08:00

test_sampler.py

Refactor e2e CI (#2276 )

2025-09-02 09:02:22 +08:00

test_vlm.py

[BugFix] cherry-pick PR 3736 to v0.11.0-dev (#3737 )

2025-10-25 10:35:14 +08:00