xc-llm-ascend

Files

yiz-liu 75d05ee200 [Core] Fix block table shape to make Prefix cache work with Ascend scheduler (#1446 )

### What this PR does / why we need it?

This fix the shape of block_table which was introduced by hybrid kv
groups several weeks ago.

Error will be raised when enable prefix-cache (eager or not) and Ascend
Scheduler at the same time, just send two identical requests and it will
reproduce.

v0.9.1: https://github.com/vllm-project/vllm-ascend/pull/1297

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Test manually

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

2025-06-30 11:25:19 +08:00

__init__.py

[Core] Make V1 work and enable V1 engine test (#389 )

2025-03-28 19:34:23 +08:00

attention_v1.py

[Core] Fix block table shape to make Prefix cache work with Ascend scheduler (#1446 )

2025-06-30 11:25:19 +08:00

attention.py

[Platform] Add initial experimental support for Altlas 300I series (#1333 )

2025-06-21 09:00:16 +08:00

mla_v1.py

Handle with_prefill_across_dp for multistream mla (#1322 )

2025-06-26 09:32:07 +08:00