Remove chunked_prefill_for_mla and fix ring_mla bug (#2781)

### What this PR does / why we need it? Remove chunked prefill for mla branch in mla , and change dtype of prefill_mask to avoid accuracy problem ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: ef7eefe17a --------- Signed-off-by: SunnyLee219 <3294305115@qq.com>
2025-09-18 19:43:26 +08:00
parent 79a910ef47
commit f4e3d22432
5 changed files with 83 additions and 183 deletions
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/configuration/additional_config.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/configuration/additional_config.po
@@ -148,10 +148,6 @@ msgid ""
 " to be passed in."
 msgstr "在为MOE模型使用专家负载均衡时，需要传入专家映射路径。"

-#: ../../user_guide/configuration/additional_config.md
-msgid "`chunked_prefill_for_mla`"
-msgstr "`chunked_prefill_for_mla`"
-
 #: ../../user_guide/configuration/additional_config.md
 msgid "`False`"
 msgstr "`False`"