### What this PR does / why we need it? MLA and GQA use different computation logic: MLA slice batches and only compute on the actually valid tokens. That means outer padding must be handled carefully — the accuracy issue this PR fixes was caused by stale data in `slot_mapping` being reused by subsequent inference steps. So we zeros out the portion of the slot mapping tensor that is not used by the currently scheduled tokens. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? Working on it. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>