### What this PR does / why we need it? Optimize NPU memory usage. https://github.com/vllm-project/vllm-ascend/issues/723 vllm v0.8.4.rc2 and DeepSeek R1 can only support a model length of 16K. When attempting to run with a model length of 32K, an "Out of Memory" (OOM) error will occur. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: sunbaosong <13793883820@163.com>