xc-llm-ascend

Files

Delphine-Nic a3e9673137 [long seq feat]GQA support long-prefill-token-threshold and fixbug (#4209 )

### What this PR does / why we need it?
GQA chunk prefill with pcp and dcp support long-prefill-token-threshold

The markdown format results is as below:
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| gsm8kdataset | - | accuracy | gen | 96.13 |

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

---------

Signed-off-by: Delphine-Nic <tanwenqin@huawei.com>
Signed-off-by: Delphine-Nic <t00608739@china.huawei.com>
Co-authored-by: Delphine-Nic <tanwenqin@huawei.com>
Co-authored-by: Delphine-Nic <t00608739@china.huawei.com>

2025-11-19 18:10:27 +08:00

__init__.py

[Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878 )

2025-07-19 09:42:32 +08:00

block_table.py

[bugfix] pcp + mtp acl graph bugfix (#4221 )

2025-11-19 11:21:46 +08:00

model_runner_v1.py

[long seq feat]GQA support long-prefill-token-threshold and fixbug (#4209 )

2025-11-19 18:10:27 +08:00

npu_input_batch.py

[long_seq_Feat] support chunk prefill (#4158 )

2025-11-14 08:43:37 +08:00

worker_v1.py

Upgrade to 0.11.1 newest vllm commit (#3982 )

2025-11-12 23:01:19 +08:00