### What this PR does / why we need it?
This pull request removes the redundant parameters `gamma1` and `beta1`
(also named `gamma0`/`beta0` in some places) from the `mla_preprocess`
kernel and its calling hierarchy. The changes are consistent across C++
kernel code, bindings, and Python call sites. The parameters were unused
in the lower-level functions, so their removal is a good cleanup.
### Does this PR introduce _any_ user-facing change?
The python interface of the kernel is affected, and the params of
`gamma0` and `beta0` are not needed.
### How was this patch tested?
The unit-test of the kernel is adapted accordingly.
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
Signed-off-by: mojave2 <chenchen145@huawei.com>
### What this PR does / why we need it?
- Adds the `mla_preprocess` custom kernel to provide an optimized
pre-processing operator for Multi-head Latent Attention (MLA) on Ascend
NPUs.
- Wires the new kernel into the C++ extension pipeline so vLLM can
invoke it directly, cutting Python-side tensor shuffling and memory
copies that previously bottlenecked MLA compilation paths.
### Does this PR introduce any user-facing change?
- No. The change only introduces a low-level kernel; public APIs and
inference behavior remain unchanged.
### How was this patch tested?
- Dedicated Ascend kernels are not covered by our CI yet, so no extra
automated tests were added. Future MLA-focused regression runs will
cover this path.
- vLLM version: v0.11.0
Signed-off-by: Chen Chen <0109chenchen@gmail.com>