[main][Docs] Fix typos across documentation (#6728)

## Summary Fix typos and improve grammar consistency across 50 documentation files. ### Changes include: - Spelling corrections (e.g., "Facotory" → "Factory", "certainty" → "determinism") - Grammar improvements (e.g., "multi-thread" → "multi-threaded", "re-routed" → "re-run") - Punctuation fixes (semicolon consistency in filter parameters) - Code style fixes (correct flag name `--num-prompts` instead of `--num-prompt`) - Capitalization consistency (e.g., "python" → "Python", "ascend" → "Ascend") - vLLM version: v0.15.0 - vLLM main: 9562912cea --------- Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
2026-02-13 15:50:05 +08:00
parent b6bc3d2f9d
commit 6de207de88
50 changed files with 273 additions and 272 deletions
--- a/docs/source/tutorials/features/suffix_speculative_decoding.md
+++ b/docs/source/tutorials/features/suffix_speculative_decoding.md
@@ -15,7 +15,7 @@ This document provides step-by-step guidance on how to deploy and benchmark the
 | Comprehensive Examination      | agieval          |
 | Multi-turn Dialogue            | sharegpt         |

-The benchmarking tool used in this tutorial is AISbench, which supports performance testing for all the datasets listed above. The final section of this tutorial presents a performance comparison between enabling and disabling Suffix Decoding under the condition of satisfying an SLO TPOT < 50ms across different datasets and concurrency levels. Validations demonstrate that the Qwen3-32B model achieves a throughput improvement of approximately 20% to 80% on various real-world datasets when Suffix Decoding is enabled.
+The benchmarking tool used in this tutorial is AISBench, which supports performance testing for all the datasets listed above. The final section of this tutorial presents a performance comparison between enabling and disabling Suffix Decoding under the condition of satisfying an SLO TPOT < 50ms across different datasets and concurrency levels. Validations demonstrate that the Qwen3-32B model achieves a throughput improvement of approximately 20% to 80% on various real-world datasets when Suffix Decoding is enabled.

 ## **Download vllm-ascend Image**

@@ -71,14 +71,14 @@ pip install arctic-inference

 ## **vLLM Instance Deployment**

-Use the following command to start the container service instance. Speculative inference is enabled via the `--speculative-config` parameter, where `method` is set to`suffix`. For this test, `num_speculative_tokens` is uniformly set to`3`.
+Use the following command to start the container service instance. Speculative inference is enabled via the `--speculative-config` parameter, where `method` is set to `suffix`. For this test, `num_speculative_tokens` is uniformly set to `3`.

 ```bash
-# set the NPU device number
+# set the NPU device number:
 export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
 # Set the operator dispatch pipeline level to 1 and disable manual memory control in ACLGraph
 export TASK_QUEUE_ENABLE=1
-# Enable the AIVector core to directly schedule ROCE communication
+# Enable the AIVector core to directly schedule ROCE communication.
 export HCCL_OP_EXPANSION_MODE="AIV"
 # Enable MLP prefetch for better performance.
 export VLLM_ASCEND_ENABLE_PREFETCH_MLP=1