Commit Graph

4 Commits

Author SHA1 Message Date
Yizhou
3158742a97 [Refactor] Refactor Ascend attention implementation forward (#3714)
### What this PR does / why we need it?
This PR refactors the Ascend attention implementation to align with
vLLM's core interfaces, simplifying the code and improving
maintainability.

### Key Changes:

* **Align with vLLM's Attention Interface**: The `forward` method
signature in `AscendAttentionBackendImpl` now matches the base
`AttentionImpl` in vLLM, removing the custom `trace_flag`.

* **Enable Opaque Attention Operator**: By adding `opaque_attention_op`
to `AscendPlatform`, we allow vLLM to wrap our attention kernel in its
standard `vllm.unified_attention_with_output` operator. This avoids the
need for a custom call path.

*   **Remove Obsolete Code**:
* The custom op `vllm.unified_ascend_attention_with_output` has been
deleted as it is now redundant.
* The `trace_flag` and its associated logic were removed, reducing code
complexity.
* An outdated quantization branch within the attention implementation
was cleaned up.

* **Improve Readability**: Renamed output variables (`output` vs.
`intermediate_output`) and added comments to clarify the in-place nature
of the attention output.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
No extra tests needed.

- vLLM version: v0.11.0rc3
- vLLM main:
17c540a993

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
2025-10-25 08:58:35 +08:00
Mengqing Cao
2d885869c5 [KVCache][Bugfix] Fix kv cache initialization error of attention layer (#3113)
### What this PR does / why we need it?
Fixes #3096 
1. Fix kv cache initialization error of attention layer. There are some
models with layer name like `attn.attn`, instead of `self_attn`, but the
initialization of kv cache tensors only check for `self_attn` and
`attn.attn`, which leding to the error `AssertionError: Some layers are
not correctly initialized`
2. Set the default value of input arg `sampling_metadata` in
`compute_logits` for the modeling files in vllm-ascend. Thus fixing the
error `Qwen3NextForCausalLM.compute_logits() missing 1 required
positional argument: 'sampling_metadata'`

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
test locally with internlm


- vLLM version: v0.10.2
- vLLM main:
5aeb925452

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-09-24 11:32:34 +08:00
Li Wang
02f89d166f [CI] Update vllm version to 20250922(5aeb925) (#3091)
### What this PR does / why we need it?
This pr bump vllm commit hash to
5aeb925452
fix issues:  
1. https://github.com/vllm-project/vllm/pull/25345 has remove v0
metadata
2. https://github.com/vllm-project/vllm/pull/25332
3. https://github.com/vllm-project/vllm/pull/25334
4. https://github.com/vllm-project/vllm/pull/23558, note that this vllm
commit update the model register logic, which will check all the model
registered have the `vllm.model_executor.models` path , which breaks our
custom registration of the deepseek_v3 model (it doesn't exist in the
vllm model path). so I move deepseek_v3 model registy to deepseek_v2 to
solve temporary

### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
9607d5eb44

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-09-22 22:18:13 +08:00
Angazenn
e7409e95ee [1/N][Draft][Refactor]torchair pangu_moe modeling refactor (#2437)
### What this PR does / why we need it?

1. Similar to #2384 , this PR add a torchair-specific modeling for
pangu.
2. Fixes a bug introduced by routed_scaling_factor in #2675 .
3. remove eager test case for pangu since there has already been a
torchair test case.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?


- vLLM version: v0.10.1.1
- vLLM main:
6997a25ac6

---------

Signed-off-by: zengyanjia <z00883269@china.huawei.com>
Signed-off-by: Angazenn <supperccell@163.com>
Co-authored-by: zengyanjia <z00883269@china.huawei.com>
2025-09-04 10:39:21 +08:00