xc-llm-ascend

Files

Mengqing Cao 223cc34085 [KVCache] Refactor KVCache as page_size_bytes is ineffective (#3438 )

### What this PR does / why we need it?
Refactor KVCache as page_size_bytes is ineffective.

1. Currently the `AttentionSpec` is patched, but the `page_size_bytes`
is still using that in vLLM in runtime, thus the patch is not working
actually. Thus this pr removes the patch on `AttentionSpec`, and will do
the final fix in vLLM.
2. Use `MLAAttentionSpec` instead of `FullAttentionSpec` to reduce
`page_size_bytes` of spec, so that num_blocks in spec could double

### How was this patch tested?
Test pass with Qwen3-Next and DeepSeek-V3.2-Exp

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: MengqingCao <cmq0113@163.com>

2025-10-14 21:28:41 +08:00

__init__.py

[KVCache] Refactor KVCache as page_size_bytes is ineffective (#3438 )

2025-10-14 21:28:41 +08:00

patch_config.py

[Misc] Clean up useless patch (#3320 )

2025-10-09 14:07:26 +08:00

patch_distributed.py

[Misc] Remove redundant imported envs, using envs_ascend instead (#2193 )

2025-08-14 09:33:39 +08:00

patch_mamba_config.py

[KVCache] Refactor KVCache as page_size_bytes is ineffective (#3438 )

2025-10-14 21:28:41 +08:00