From 3199fe835056e33304c817283d194626abc1ea51 Mon Sep 17 00:00:00 2001 From: herizhen <59841270+herizhen@users.noreply.github.com> Date: Fri, 28 Nov 2025 17:09:26 +0800 Subject: [PATCH] [Doc]Delete equals sign (#4537) ### What this PR does / why we need it? Delete equals sign in doc ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: herizhen Co-authored-by: herizhen --- .../developer_guide/feature_guide/Multi_Token_Prediction.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/developer_guide/feature_guide/Multi_Token_Prediction.md b/docs/source/developer_guide/feature_guide/Multi_Token_Prediction.md index 27986aab..04bde6fe 100644 --- a/docs/source/developer_guide/feature_guide/Multi_Token_Prediction.md +++ b/docs/source/developer_guide/feature_guide/Multi_Token_Prediction.md @@ -6,7 +6,7 @@ MTP boosts inference performance by parallelizing the prediction of multiple tok ## How to Use MTP To enable MTP for DeepSeek-V3 models, add the following parameter when starting the service: -`--speculative_config={"method": "deepseek_mtp", "num_speculative_tokens": 1, "disable_padded_drafter_batch": False}` +--speculative_config ' {"method": "deepseek_mtp", "num_speculative_tokens": 1, "disable_padded_drafter_batch": False} ' - `num_speculative_tokens`: The number of speculative tokens which enable model to predict multiple tokens at once, if provided. It will default to the number in the draft model config if present, otherwise, it is required. - `disable_padded_drafter_batch`: Disable input padding for speculative decoding. If set to True, speculative input batches can contain sequences of different lengths, which may only be supported by certain attention backends. This currently only affects the MTP method of speculation, default is False.