Commit Graph

5 Commits

Author SHA1 Message Date
Pr0Wh1teGivee
d13fb0766e [Perf] add patch to optimize apply_topk_topp (#1732)
### What this PR does / why we need it?
Performance optimization for apply_top_k_top_p
### Does this PR introduce _any_ user-facing change?
Use VLLM_ASCEND_ENABLE_TOPK_TOPP_OPTIMIZATION to enable this feature
### How was this patch tested?
e2e & ut

















- vLLM version: v0.9.2
- vLLM main:
6a9e6b2abf

Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
2025-07-11 15:32:02 +08:00
wangxiyuan
a45dfde283 [CI] Fix FusedMoEConfig and input batch failure to recover CI (#1602)
Make CI happy

1.
c1909e7e8c
changed moeConfig init way
2.
48fb076cbc
changed input batch logic.

This PR address these change to vllm-ascend.

Closes: https://github.com/vllm-project/vllm-ascend/issues/1600

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-03 18:36:17 +08:00
Pr0Wh1teGivee
2fda60464c [Perf] Use fused ops npu_top_k_top_p (#1308)
### What this PR does / why we need it?
Use fused ops torch_npu.npu_top_k_top_p(logits, p, k) when p and k are
not None, otherwise fallback to the original one. The replacement will
take place automatically when `VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE=1` .

This patch are using `npu_top_k_top_p` which required
torch_npu>=2.5.1.post1.dev20250619

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tested by DeepSeek R1 and UT passed

Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
2025-06-25 20:59:06 +08:00
linfeng-yuan
b80a484864 Fix typo of VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE (#1112)
### What this PR does / why we need it?
Fix typo of VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

CI passed

Signed-off-by: linfeng-yuan <1102311262@qq.com>
2025-06-07 19:45:33 +08:00
sherie
908a851a77 optimize the funtion of computing topk and topp in sampler. (#970)
### What this PR does / why we need it?
Optimize the performance of calculation logic in sampler and deepseekv2.

### Does this PR introduce _any_ user-facing change?
Added VLLM_ENABLE_TOPK_OPTIMZE config in sampler

### How was this patch tested?
pytest test_sampler.py

Signed-off-by: wangxiaoxin (A) <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin (A) <wangxiaoxin7@huawei.com>
Co-authored-by: ZhengWG <zwg0606@gmail.com>
2025-06-05 16:42:18 +08:00