xc-llm-ascend

Files

LICO67373 687df88151 [Refactor] Move AttentionSpec initialization to Attention module (#5834 )

### What this PR does / why we need it?

This PR refactors `get_kv_cache_spec` method to delegate AttentionSpec
creation to each attention module's own `get_kv_cache_spec()` method,
aligning with the vllm source code structure.

**Changes:**
- Simplify `get_kv_cache_spec` in `model_runner_v1.py` and
`cpu_offload_connector.py`
- Remove manual `AttentionType` checks for `Attention` modules
- Delegate spec creation to each attention module's `get_kv_cache_spec`
method directly
- Let `MambaBase` layers use their own `get_kv_cache_spec` method
- Keep `use_sparse` hack for `MLAAttention` (DeepSeek DSA mode) as
Ascend-specific handling

This change follows RFC #5463 item 12: move AttentionSpec to Attention
module.

- Fixes #5463 (item 12)

### Does this PR introduce _any_ user-facing change?

No. This is an internal refactoring that simplifies code structure
without changing any external behavior.

### How was this patch tested?

- Syntax validation passed via `python -m py_compile`
- CI tests will verify the changes work correctly with existing test
cases
- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

Signed-off-by: lico67373 <918688502@qq.com>

2026-01-19 14:22:18 +08:00

__init__.py

[Refactor]Refactor of vllm_ascend/distributed module (#5719 )

2026-01-15 08:57:40 +08:00

cpu_kv_cache_manager.py

[Bugfix] fix cpu offload hang with tp=1 (#5963 )

2026-01-17 11:50:13 +08:00

cpu_offload_connector.py

[Refactor] Move AttentionSpec initialization to Attention module (#5834 )

2026-01-19 14:22:18 +08:00

metadata.py

[Bugfix] fix cpu offload hang with tp=1 (#5963 )

2026-01-17 11:50:13 +08:00