Remove chunked_prefill_for_mla and fix ring_mla bug (#2781)

### What this PR does / why we need it?
Remove chunked prefill for mla branch in mla , and change dtype of
prefill_mask to avoid accuracy problem
### Does this PR introduce _any_ user-facing change?
NO
### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
ef7eefe17a

---------

Signed-off-by: SunnyLee219 <3294305115@qq.com>
This commit is contained in:
LeeWenquan
2025-09-18 19:43:26 +08:00
committed by GitHub
parent 79a910ef47
commit f4e3d22432
5 changed files with 83 additions and 183 deletions

View File

@@ -70,9 +70,7 @@ vllm serve /models/deepseek_r1_w8a8 \
"kv_port": "20001",
"engine_id": "0",
"kv_connector_module_path": "vllm_ascend.distributed.llmdatadist_c_mgr_connector"
}' \
--additional-config \
'{"chunked_prefill_for_mla":true}'
}'
```
Run prefill server P2 on second node:
@@ -114,9 +112,7 @@ vllm serve /models/deepseek_r1_w8a8 \
"kv_port": "20001",
"engine_id": "0",
"kv_connector_module_path": "vllm_ascend.distributed.llmdatadist_c_mgr_connector"
}' \
--additional-config \
'{"chunked_prefill_for_mla":true}'
}'
```
Run decode server d1 on third node: