Remove chunked_prefill_for_mla and fix ring_mla bug (#2781)

### What this PR does / why we need it?
Remove chunked prefill for mla branch in mla , and change dtype of
prefill_mask to avoid accuracy problem
### Does this PR introduce _any_ user-facing change?
NO
### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
ef7eefe17a

---------

Signed-off-by: SunnyLee219 <3294305115@qq.com>
This commit is contained in:
LeeWenquan
2025-09-18 19:43:26 +08:00
committed by GitHub
parent 79a910ef47
commit f4e3d22432
5 changed files with 83 additions and 183 deletions

View File

@@ -148,10 +148,6 @@ msgid ""
" to be passed in."
msgstr "在为MOE模型使用专家负载均衡时需要传入专家映射路径。"
#: ../../user_guide/configuration/additional_config.md
msgid "`chunked_prefill_for_mla`"
msgstr "`chunked_prefill_for_mla`"
#: ../../user_guide/configuration/additional_config.md
msgid "`False`"
msgstr "`False`"