remove redundant params in mla_preprocess kernel (#3530)

### What this PR does / why we need it?

This pull request removes the redundant parameters `gamma1` and `beta1`
(also named `gamma0`/`beta0` in some places) from the `mla_preprocess`
kernel and its calling hierarchy. The changes are consistent across C++
kernel code, bindings, and Python call sites. The parameters were unused
in the lower-level functions, so their removal is a good cleanup.

### Does this PR introduce _any_ user-facing change?

The python interface of the kernel is affected, and the params of
`gamma0` and `beta0` are not needed.

### How was this patch tested?

The unit-test of the kernel is adapted accordingly.


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: mojave2 <chenchen145@huawei.com>
This commit is contained in:
Chen Chen
2025-10-21 19:20:13 +08:00
committed by GitHub
parent 80b8df881f
commit 6b290acfe1
9 changed files with 34 additions and 50 deletions

View File

@@ -716,17 +716,7 @@ class AscendMLAImpl(MLAAttentionImpl):
self.qb_qt_bias = qb_qt_bias.reshape(
self.num_heads * (self.qk_nope_head_dim + self.qk_rope_head_dim))
device = self.q_proj.weight.device
self.gamma0 = torch.ones(
[self.fused_qkv_a_proj.weight.shape[-1]],
dtype=act_dtype,
device=device,
)
self.beta0 = torch.zeros(
[self.fused_qkv_a_proj.weight.shape[-1]],
dtype=act_dtype,
device=device,
)
device = self.q_a_proj.weight.device
self.gamma1 = self.q_a_layernorm.weight.data
self.beta1 = self.q_a_layernorm.bias.data
self.gamma2 = self.kv_a_layernorm.weight.data
@@ -1085,8 +1075,6 @@ class AscendMLAImpl(MLAAttentionImpl):
torch.ops._C_ascend.mla_preprocess(
hidden_states,
self.gamma0,
self.beta0,
self.wd_qkv,
self.deq_scale_qkv,
self.gamma1,