xc-llm-ascend

Author SHA1 Message Date

Author	SHA1	Message	Date
luomin2005	f41eeeb11e	Refactor the ops PyTorch adapter，cleanup for csrc/torch_binding.cpp (#6732 ) ### What this PR does / why we need it? Refactor the ops PyTorch adapter，cleanup for csrc/torch_binding.cpp, more details see https://github.com/vllm-project/vllm-ascend/issues/6486 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? install the new package to test the new modification, here is the result: - vLLM version: v0.15.0 - vLLM main: `9562912cea` --------- Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: luomin2005 <luomin2005@huawei.com> Co-authored-by: liziyu <56102866+liziyu179@users.noreply.github.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>	2026-02-24 09:12:43 +08:00
linfeng-yuan	e25ee65729	[Misc][Test] add e2e test for apply_top_k_top_p_custom kernel (#6348 ) ### What this PR does / why we need it? Add e2e test case for apply_top_k_top_p_custom kernel and eliminate chinese comments. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? pytest passed. - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` --------- Signed-off-by: linfeng-yuan <1102311262@qq.com>	2026-01-28 17:25:57 +08:00
linfeng-yuan	96309e2b79	[ops] support advanced apply_top_k_top_p without top_k constraint (#6098 ) ### What this PR does / why we need it? Implement `apply_top_k_top_p` via ascendC to eliminate the constraint of k [1,1024]. It enables high performance TopKTopP calculation and avoid D2H synchronization introduced by k validation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? E2E serving with `k=4096` and `p=0.95` - vLLM version: v0.13.0 - vLLM main: `d68209402d` --------- Signed-off-by: linfeng-yuan <1102311262@qq.com> Signed-off-by: SlightwindSec <slightwindsec@gmail.com> Co-authored-by: SlightwindSec <slightwindsec@gmail.com>	2026-01-26 09:08:42 +08:00

luomin2005

f41eeeb11e

Refactor the ops PyTorch adapter，cleanup for csrc/torch_binding.cpp (#6732 )

### What this PR does / why we need it?
Refactor the ops PyTorch adapter，cleanup for csrc/torch_binding.cpp,
more details see
https://github.com/vllm-project/vllm-ascend/issues/6486

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
install the new package to test the new modification, here is the
result:


- vLLM version: v0.15.0
- vLLM main:
9562912cea

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: luomin2005 <luomin2005@huawei.com>
Co-authored-by: liziyu <56102866+liziyu179@users.noreply.github.com>
Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>

2026-02-24 09:12:43 +08:00

linfeng-yuan

e25ee65729

[Misc][Test] add e2e test for apply_top_k_top_p_custom kernel (#6348 )

### What this PR does / why we need it?
Add e2e test case for apply_top_k_top_p_custom kernel and eliminate
chinese comments.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
pytest passed.

- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

---------

Signed-off-by: linfeng-yuan <1102311262@qq.com>

2026-01-28 17:25:57 +08:00

linfeng-yuan

96309e2b79

[ops] support advanced apply_top_k_top_p without top_k constraint (#6098 )

### What this PR does / why we need it?
Implement `apply_top_k_top_p` via ascendC to eliminate the constraint of
k [1,1024]. It enables high performance TopKTopP calculation and avoid
D2H synchronization introduced by k validation.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
E2E serving with `k=4096` and  `p=0.95`
- vLLM version: v0.13.0
- vLLM main:
d68209402d

---------

Signed-off-by: linfeng-yuan <1102311262@qq.com>
Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
Co-authored-by: SlightwindSec <slightwindsec@gmail.com>

2026-01-26 09:08:42 +08:00

3 Commits