xc-llm-ascend

Files

Mengqing Cao 2d885869c5 [KVCache][Bugfix] Fix kv cache initialization error of attention layer (#3113 )

### What this PR does / why we need it?
Fixes #3096 
1. Fix kv cache initialization error of attention layer. There are some
models with layer name like `attn.attn`, instead of `self_attn`, but the
initialization of kv cache tensors only check for `self_attn` and
`attn.attn`, which leding to the error `AssertionError: Some layers are
not correctly initialized`
2. Set the default value of input arg `sampling_metadata` in
`compute_logits` for the modeling files in vllm-ascend. Thus fixing the
error `Qwen3NextForCausalLM.compute_logits() missing 1 required
positional argument: 'sampling_metadata'`

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
test locally with internlm


- vLLM version: v0.10.2
- vLLM main:
5aeb925452

---------

Signed-off-by: MengqingCao <cmq0113@163.com>

2025-09-24 11:32:34 +08:00

layers

[Feature]cpu offload connector (#1659 )

2025-09-23 14:25:05 +08:00

__init__.py

[CI] Update vllm version to 20250922(5aeb925) (#3091 )

2025-09-22 22:18:13 +08:00

deepseek_mtp.py

[KVCache][Bugfix] Fix kv cache initialization error of attention layer (#3113 )

2025-09-24 11:32:34 +08:00

deepseek_v2.py

[CI] Update vllm version to 20250922(5aeb925) (#3091 )