xc-llm-ascend

Files

QilaiZhang d30bb95b90 [Bugfix] Fix zero attention output in qwen3-next (#3572 )

### What this PR does / why we need it?
Since Attention and LinearAttention share the same ```slot_mapping```,
and the ```slot_mapping``` for LinearAttention is all zeros, the
```slot_mapping``` for Attention gets overwritten, resulting in the
computed output being all zeros.

This PR removes the uniformly managed ```self.slot_mapping``` and
directly passes the ```slot_mapping``` from ```input_batch.blocktable```
to ```attn_metadata```, along with modifying the relevant references.
Due to hardware, the data type of ```block_table.slot_mapping``` needs
to be set to int32.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed with existing test.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: QilaiZhang <245706640@qq.com>

2025-10-25 09:47:03 +08:00

__init__.py

[Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878 )

2025-07-19 09:42:32 +08:00

block_table.py

[Bugfix] Fix zero attention output in qwen3-next (#3572 )

2025-10-25 09:47:03 +08:00

model_runner_v1.py

[Bugfix] Fix zero attention output in qwen3-next (#3572 )

2025-10-25 09:47:03 +08:00

npu_input_batch.py

[1/N][Refactor] Refactor code to adapt with vllm main (#3612 )

2025-10-24 16:55:08 +08:00

worker_v1.py

[1/N][Refactor] Refactor code to adapt with vllm main (#3612 )

2025-10-24 16:55:08 +08:00