NeverRaR
f2dd5f8d08
fix : support chunked_prefill with deepseek_mtp (#2711)
### What this PR does / why we need it?
fix : support chunked_prefill with deepseek_mtp
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
```
vllm serve $MODEL_PATH
--quantization ascend \
--served-model-name auto \
--trust-remote-code \
--distributed-executor-backend=mp \
--port 8006 \
-tp=8 \
-dp=2 \
--no-enforce-eager \
--max-num-seqs 24 \
--max-model-len 32768 \
--max-num-batched-tokens 16384 \
--block-size 128 \
--no-enable-prefix-caching \
--disable-log-requests \
--speculative-config '{"num_speculative_tokens":1, "method": "deepseek_mtp"}' \
--additional-config '{"torchair_graph_config":{"enabled":true,"use_cached_graph":true,"graph_batch_sizes":[24],"enable_multistream_mla": true},"ascend_scheduler_config":{"enabled":false},"expert_tensor_parallel_size":16, "chunked_prefill_for_mla":true}' \
--gpu-memory-utilization 0.95
```
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
Signed-off-by: boying <897013703@qq.com>
2025-10-22 11:52:27 +08:00
..
2025-10-21 22:24:30 +08:00
2025-10-17 20:19:56 +08:00
2025-10-18 15:56:44 +08:00
2025-07-28 16:01:59 +08:00
2025-10-21 20:18:17 +08:00
2025-10-21 22:58:02 +08:00
2025-09-28 17:30:50 +08:00
2025-10-21 22:24:30 +08:00
2025-08-15 07:35:27 +08:00
2025-10-22 11:41:30 +08:00
2025-10-21 22:58:02 +08:00
2025-10-22 11:41:30 +08:00
2025-10-09 10:28:38 +08:00
2025-10-21 22:24:30 +08:00
2025-10-22 11:52:27 +08:00
2025-10-21 22:58:02 +08:00
2025-10-15 17:48:58 +08:00
2025-10-21 09:17:03 +08:00
2025-10-17 21:13:41 +08:00
2025-10-21 09:17:03 +08:00
2025-10-15 19:36:32 +08:00
2025-09-13 11:58:52 +08:00
2025-10-21 22:24:30 +08:00
2025-10-20 20:19:24 +08:00