xc-llm-ascend/e2e at 4e62a2ae15ccc3b1344a31e8716f4a27da097858 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

wangx700 22d0e1d3d7 [model_runner_v2]optimize the performance of the _topk_log_softmax_kernel (#7221 )

### What this PR does / why we need it?
Optimize the performance of the triton operator _topk_log_softmax_kernel
in model_runner_v2 to 1.04xH100，which is 7% of its original value.(issue
https://github.com/vllm-project/vllm-ascend/issues/5208)

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: wangx700 <wangxin700@huawei.com>

2026-03-16 16:49:10 +08:00

..

[Lint]Style: Convert test/ to ruff format(Batch #1 ) (#6738 )

2026-03-10 09:52:50 +08:00

[Doc] Recover installation doc to use pip install (#4109 )

2025-11-11 09:25:44 +08:00

[Lint]Style: Convert test/ to ruff format(Batch #1 ) (#6738 )

2026-03-10 09:52:50 +08:00

[bugfix] restore pr-7029 and fix patch error (#7294 )

2026-03-16 15:39:42 +08:00

[model_runner_v2]optimize the performance of the _topk_log_softmax_kernel (#7221 )

2026-03-16 16:49:10 +08:00

[E2E] add E2E for Prefix Caching cp & Chunked Prefill cp (#5149 )

2026-02-03 15:04:14 +08:00

[Refactor] Replace npu_ring_mla with FIA in MLA prefill (#5704 )

2026-03-16 10:33:09 +08:00

[CI] Upgrade CANN to 8.5.1 (#6897 )

2026-03-03 09:02:42 +08:00

weekly/single_node/models

[TEST]add a qwen3-30b acc case with mooncake mempool (#6244 )

2026-02-10 16:26:55 +08:00

__init__.py

[Test] Clean up duplicate test for ascend scheduler (#1819 )

2025-07-16 17:57:48 +08:00

common.sh

Increase doctest timeout to 300s and time print (#3041 )

2025-09-19 20:26:00 +08:00

conftest.py

[CI][Misc] Use offline mode for model downloads (#7179 )

2026-03-13 08:52:24 +08:00

model_utils.py

[Lint]Style: Convert test/ to ruff format(Batch #1 ) (#6738 )

2026-03-10 09:52:50 +08:00

run_doctests.sh

[CI] Fix doc test fail when load model with error information: 'Stale file handle' (#6832 )

2026-02-27 09:14:42 +08:00

utils.py

[Test] Remove VLLM_USE_V1 in example and tests (#1733 )

2025-07-15 12:49:57 +08:00