xc-llm-ascend

Files

Pr0Wh1teGivee 2fda60464c [Perf] Use fused ops npu_top_k_top_p (#1308 )

### What this PR does / why we need it?
Use fused ops torch_npu.npu_top_k_top_p(logits, p, k) when p and k are
not None, otherwise fallback to the original one. The replacement will
take place automatically when `VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE=1` .

This patch are using `npu_top_k_top_p` which required
torch_npu>=2.5.1.post1.dev20250619

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tested by DeepSeek R1 and UT passed

Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>

2025-06-25 20:59:06 +08:00

fake_weight

[CI] Add unit test framework (#1201 )

2025-06-16 18:32:28 +08:00

ops

[UT] refactor test_expert_load_balancer and fix broken CI (#1293 )

2025-06-20 01:02:52 +08:00

worker

[Perf] Use fused ops npu_top_k_top_p (#1308 )

2025-06-25 20:59:06 +08:00

test_ascend_config.py

[CI] Add unit test framework (#1201 )

2025-06-16 18:32:28 +08:00