Commit Graph

3 Commits

Author SHA1 Message Date
debuger
c1618a0427 [Bugfix]Fix the compatibility issue of may_reinitialize_input_batch (#6290)
### What this PR does / why we need it?
Added a check in the may_reinitialize_input_batch method to verify
whether the backend implements the get_supported_block_size method

### Does this PR introduce _any_ user-facing change?
no user-facing change

### How was this patch tested?
Only a few lines of code within the methods were modified, and the
format check test has been passed.
- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

---------

Signed-off-by: Debuuuuger <huangzr@cmbchina.com>
Signed-off-by: debuger <102402761+huangazazaz@users.noreply.github.com>
Signed-off-by: Debuuuuger <12110718@mail.sustech.edu.cn>
Co-authored-by: Debuuuuger <huangzr@cmbchina.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-02 19:16:26 +08:00
pu-zhe
57fd6e4bd9 [Refact.]: refactoring 310p-kv cache allocator, align with main branch (#6270)
### What this PR does / why we need it?
refactoring 310p-kv cache allocator, align with main branch

vLLM version: v0.14.0
vLLM main: https://github.com/vllm-project/vllm-ascend/pull/6270
Qwen2.5-7B E2E Test

---------

Signed-off-by: pu-zhe <puzhe1@h-partners.com>
Signed-off-by: pu-zhe <zpuaa@outlook.com>
Co-authored-by: pu-zhe <puzhe1@h-partners.com>
2026-01-27 16:26:48 +08:00
Shaoxu Cheng
fbae41697e [310P]: refactoring for 310p kvcache and some ops class (#6117)
### What this PR does / why we need it?
* Refactor the LayerNorm and activation operator classes to decouple the
310P device implementation from the main branch.
* Refactor `mm_encoder_attention` on 310P to use the
`torch_npu._npu_flash_attention_unpad` operator.
* Refactor the QKV inputs in the prefill stage of `attention_v1` on 310P
so they are no longer padded to 16× alignment.
* Refactor `model_runner` on 310P to align the KV-cache initialization
logic with the mainline implementation.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
use the e2e tests.

- vLLM version: v0.13.0
- vLLM main:
d68209402d

---------

Signed-off-by: Tflowers-0129 <2906339855@qq.com>
2026-01-24 20:34:29 +08:00