Remove chunked_prefill_for_mla and fix ring_mla bug (#2781)

### What this PR does / why we need it? Remove chunked prefill for mla branch in mla , and change dtype of prefill_mask to avoid accuracy problem ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: ef7eefe17a --------- Signed-off-by: SunnyLee219 <3294305115@qq.com>
2025-09-18 19:43:26 +08:00
parent 79a910ef47
commit f4e3d22432
5 changed files with 83 additions and 183 deletions
--- a/examples/disaggregated_prefill_v1/README.md
+++ b/examples/disaggregated_prefill_v1/README.md
@@ -70,9 +70,7 @@ vllm serve /models/deepseek_r1_w8a8 \
  "kv_port": "20001",
  "engine_id": "0",
  "kv_connector_module_path": "vllm_ascend.distributed.llmdatadist_c_mgr_connector"
-  }'  \
-  --additional-config \
-  '{"chunked_prefill_for_mla":true}' 
+  }'
 ```

 Run prefill server P2 on second node:
@@ -114,9 +112,7 @@ vllm serve /models/deepseek_r1_w8a8 \
  "kv_port": "20001",
  "engine_id": "0",
  "kv_connector_module_path": "vllm_ascend.distributed.llmdatadist_c_mgr_connector"
-  }'  \
-  --additional-config \
-  '{"chunked_prefill_for_mla":true}'
+  }'
 ```

 Run decode server d1 on third node: