xc-llm-ascend

Files

wangx700 22d0e1d3d7 [model_runner_v2]optimize the performance of the _topk_log_softmax_kernel (#7221 )

### What this PR does / why we need it?
Optimize the performance of the triton operator _topk_log_softmax_kernel
in model_runner_v2 to 1.04xH100，which is 7% of its original value.(issue
https://github.com/vllm-project/vllm-ascend/issues/5208)

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: wangx700 <wangxin700@huawei.com>

2026-03-16 16:49:10 +08:00

__init__.py

[Feature] support eager mode in model runner v2 (#5210 )

2025-12-29 15:28:34 +08:00

gumbel.py

[Feature] adapt to uva buffer and main2main (#6657 )

2026-02-12 10:36:31 +08:00

logprob.py

[model_runner_v2]optimize the performance of the _topk_log_softmax_kernel (#7221 )

2026-03-16 16:49:10 +08:00

penalties.py

[MODELRUNNERV2]fix penality ops (#7013 )

2026-03-11 17:13:34 +08:00

sampler.py

[Feature] support aclgraph for model runner v2 (#7110 )

2026-03-13 09:11:46 +08:00