xc-llm-ascend

Files

XiaoxinWang 15dc01f050 [Fix] Fix FIA query and query_start_loc shape mismatch error (#4518 )

### What this PR does / why we need it?
Due to the requirement of the FIA operator that the **query.shape[0]**
must match **actual_seq_len[-1]**, in graph mode and multi-DP scenarios,
the query is padded to the size of **num_input_token**. This leads to
validation errors during tiling in the operator. However, since the
padding is applied at the end of the query, it does not affect the
actual execution result of the operator, and the precision remains
unaffected.
<img width="2434" height="49" alt="image"
src="https://github.com/user-attachments/assets/63520816-fbc3-4382-82b9-89dbb1492f6c"
/>
Our modification padding both **actual_seq_len** and
**actual_seq_len_kv** to resolve the validation issue in the operator.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>

2025-12-03 17:33:31 +08:00

__init__.py

[Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878 )

2025-07-19 09:42:32 +08:00

block_table.py

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

model_runner_v1.py

[Fix] Fix FIA query and query_start_loc shape mismatch error (#4518 )

2025-12-03 17:33:31 +08:00

npu_input_batch.py

upgrade to vllm 0.11.2 (#4400 )

2025-11-26 11:48:58 +08:00

worker_v1.py

[kernel] add AscendC op: lightning_indexer and sparse_flash_attention (#4625 )

2025-12-03 09:53:10 +08:00