### What this PR does / why we need it? Previously, the mask construction process created multiple tensors of size (max_model_len, max_model_len). When max_model_len reached 128k, single GPU host memory usage exceeded hundreds of GB, causing process OOM crashes. This update optimizes the mask generation to significantly reduce memory consumption. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI pass. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>