[main][Docs] Fix spelling errors across documentation (#6649)

Fix various spelling mistakes in the project documentation to improve
clarity and correctness.
- vLLM version: v0.15.0
- vLLM main:
d7e17aaacd

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
This commit is contained in:
Cao Yi
2026-02-10 11:14:57 +08:00
committed by GitHub
parent 5b8e47cb68
commit 1c7d1163f5
30 changed files with 67 additions and 67 deletions

View File

@@ -3,7 +3,7 @@
## Introduction
DeepSeek-R1 is a high-performance Mixture-of-Experts (MoE) large language model developed by DeepSeek Company. It excels in complex logical reasoning, mathematical problem-solving, and code generation. By dynamically activating its expert networks, it delivers exceptional performance while maintaining computational efficiency. Building upon R1, DeepSeek-R1-W8A8 is a fully quantized version of the model. It employs 8-bit integer (INT8) quantization for both weights and activations, which significantly reduces the model's memory footprint and computational requirements, enabling more efficient deployment and application in resource-constrained environments.
This article takes the deepseek- R1-W8A8 version as an example to introduce the deployment of the R1 series models.
This article takes the `DeepSeek-R1-W8A8` version as an example to introduce the deployment of the R1 series models.
## Supported Features
@@ -25,7 +25,7 @@ If you want to deploy multi-node environment, you need to verify multi-node comm
### Installation
You can using our official docker image to run `DeepSeek-R1-W8A8` directly.
You can use our official docker image to run `DeepSeek-R1-W8A8` directly.
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
@@ -293,7 +293,7 @@ Run performance evaluation of `DeepSeek-R1-W8A8` as an example.
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
There are three `vllm bench` subcommand:
There are three `vllm bench` subcommands:
- `latency`: Benchmark the latency of a single batch of requests.
- `serve`: Benchmark the online serving throughput.

View File

@@ -38,7 +38,7 @@ If you want to deploy multi-node environment, you need to verify multi-node comm
### Installation
You can using our official docker image to run `DeepSeek-V3.1` directly.
You can use our official docker image to run `DeepSeek-V3.1` directly.
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
@@ -96,7 +96,7 @@ nic_name="xxxx"
local_ip="xxxx"
# [Optional] jemalloc
# jemalloc is for better performance, if `libjemalloc.so` is install on your machine, you can turn it on.
# jemalloc is for better performance, if `libjemalloc.so` is installed on your machine, you can turn it on.
# export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
# AIV
@@ -158,7 +158,7 @@ local_ip="xxxx"
node0_ip="xxxx"
# [Optional] jemalloc
# jemalloc is for better performance, if `libjemalloc.so` is install on your machine, you can turn it on.
# jemalloc is for better performance, if `libjemalloc.so` is installed on your machine, you can turn it on.
# export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
export HCCL_IF_IP=$local_ip
@@ -210,7 +210,7 @@ local_ip="xxx"
node0_ip="xxxx"
# [Optional] jemalloc
# jemalloc is for better performance, if `libjemalloc.so` is install on your machine, you can turn it on.
# jemalloc is for better performance, if `libjemalloc.so` is installed on your machine, you can turn it on.
# export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
export HCCL_IF_IP=$local_ip
@@ -275,7 +275,7 @@ local_ip="141.xx.xx.1"
node0_ip="xxxx"
# [Optional] jemalloc
# jemalloc is for better performance, if `libjemalloc.so` is install on your machine, you can turn it on.
# jemalloc is for better performance, if `libjemalloc.so` is installed on your machine, you can turn it on.
# export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
export HCCL_IF_IP=$local_ip
@@ -352,7 +352,7 @@ local_ip="141.xx.xx.2"
node0_ip="xxxx"
# [Optional] jemalloc
# jemalloc is for better performance, if `libjemalloc.so` is install on your machine, you can turn it on.
# jemalloc is for better performance, if `libjemalloc.so` is installed on your machine, you can turn it on.
# export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
export HCCL_IF_IP=$local_ip
@@ -429,7 +429,7 @@ local_ip="141.xx.xx.3"
node0_ip="xxxx"
# [Optional] jemalloc
# jemalloc is for better performance, if `libjemalloc.so` is install on your machine, you can turn it on.
# jemalloc is for better performance, if `libjemalloc.so` is installed on your machine, you can turn it on.
# export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
export HCCL_IF_IP=$local_ip
@@ -505,7 +505,7 @@ local_ip="141.xx.xx.4"
node0_ip="xxxx"
# [Optional] jemalloc
# jemalloc is for better performance, if `libjemalloc.so` is install on your machine, you can turn it on.
# jemalloc is for better performance, if `libjemalloc.so` is installed on your machine, you can turn it on.
# export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
export HCCL_IF_IP=$local_ip
@@ -707,7 +707,7 @@ Run performance evaluation of `DeepSeek-V3.1-w8a8-mtp-QuaRot` as an example.
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
There are three `vllm bench` subcommand:
There are three `vllm bench` subcommands:
- `latency`: Benchmark the latency of a single batch of requests.
- `serve`: Benchmark the online serving throughput.

View File

@@ -29,7 +29,7 @@ If you want to deploy multi-node environment, you need to verify multi-node comm
### Installation
You can using our official docker image to run `DeepSeek-V3.2` directly..
You can use our official docker image to run `DeepSeek-V3.2` directly.
:::::{tab-set}
:sync-group: install
@@ -456,11 +456,11 @@ Before you start, please
dp_rpc_port = args.dp_rpc_port
vllm_start_port = args.vllm_start_port
def run_command(visiable_devices, dp_rank, vllm_engine_port):
def run_command(visible_devices, dp_rank, vllm_engine_port):
command = [
"bash",
"./run_dp_template.sh",
visiable_devices,
visible_devices,
str(vllm_engine_port),
str(dp_size),
str(dp_rank),
@@ -481,9 +481,9 @@ Before you start, please
for i in range(dp_size_local):
dp_rank = dp_rank_start + i
vllm_engine_port = vllm_start_port + i
visiable_devices = ",".join(str(x) for x in range(i * tp_size, (i + 1) * tp_size))
visible_devices = ",".join(str(x) for x in range(i * tp_size, (i + 1) * tp_size))
process = multiprocessing.Process(target=run_command,
args=(visiable_devices, dp_rank,
args=(visible_devices, dp_rank,
vllm_engine_port))
processes.append(process)
process.start()
@@ -895,7 +895,7 @@ Run performance evaluation of `DeepSeek-V3.2-W8A8` as an example.
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
There are three `vllm bench` subcommand:
There are three `vllm bench` subcommands:
- `latency`: Benchmark the latency of a single batch of requests.
- `serve`: Benchmark the online serving throughput.

View File

@@ -29,7 +29,7 @@ It is recommended to download the model weight to the shared directory of multip
### Installation
You can using our official docker image to run `GLM-4.x` directly.
You can use our official docker image to run `GLM-4.x` directly.
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
@@ -146,7 +146,7 @@ Run performance evaluation of `GLM-4.x` as an example.
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
There are three `vllm bench` subcommand:
There are three `vllm bench` subcommands:
- `latency`: Benchmark the latency of a single batch of requests.
- `serve`: Benchmark the online serving throughput.

View File

@@ -22,7 +22,7 @@ It is recommended to download the model weights to a local directory (e.g., `./P
### Installation
You can using our official docker image to run `PaddleOCR-VL` directly.
You can use our official docker image to run `PaddleOCR-VL` directly.
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).

View File

@@ -548,7 +548,7 @@ lm_eval \
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
There are three `vllm bench` subcommand:
There are three `vllm bench` subcommands:
- `latency`: Benchmark the latency of a single batch of requests.
- `serve`: Benchmark the online serving throughput.

View File

@@ -24,7 +24,7 @@ It is recommended to download the model weights to a local directory (e.g., `./Q
### Installation
You can using our official docker image and install extra operator for supporting `Qwen2.5-7B-Instruct`.
You can use our official docker image and install extra operator for supporting `Qwen2.5-7B-Instruct`.
:::::{tab-set}
:sync-group: install
@@ -158,7 +158,7 @@ Run performance evaluation of `Qwen2.5-7B-Instruct` as an example.
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
There are three `vllm bench` subcommand:
There are three `vllm bench` subcommands:
- `latency`: Benchmark the latency of a single batch of requests.
- `serve`: Benchmark the online serving throughput.

View File

@@ -19,11 +19,11 @@ Refer to [feature guide](../user_guide/feature_guide/index.md) to get the featur
- `Qwen2.5-Omni-3B`(BF16): [Download model weight](https://huggingface.co/Qwen/Qwen2.5-Omni-3B)
- `Qwen2.5-Omni-7B`(BF16): [Download model weight](https://huggingface.co/Qwen/Qwen2.5-Omni-7B)
Following examples use the 7B version deafultly.
Following examples use the 7B version by default.
### Installation
You can using our official docker image to run `Qwen2.5-Omni` directly.
You can use our official docker image to run `Qwen2.5-Omni` directly.
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
@@ -97,7 +97,7 @@ VLLM_TARGET_DEVICE=empty pip install -v ".[audio]"
`--allowed-local-media-path` is optional, only set it if you need infer model with local media file
`--gpu-memory-utilization` should not be set manually only if yous know what this parameter aims to.
`--gpu-memory-utilization` should not be set manually only if you know what this parameter aims to.
#### Multiple NPU (Qwen2.5-Omni-7B)
@@ -195,7 +195,7 @@ Run performance evaluation of `Qwen2.5-Omni-7B` as an example.
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
There are three `vllm bench` subcommand:
There are three `vllm bench` subcommands:
- `latency`: Benchmark the latency of a single batch of requests.
- `serve`: Benchmark the online serving throughput.

View File

@@ -300,7 +300,7 @@ Run performance evaluation of `Qwen3-235B-A22B-w8a8` as an example.
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
There are three `vllm bench` subcommand:
There are three `vllm bench` subcommands:
- `latency`: Benchmark the latency of a single batch of requests.
- `serve`: Benchmark the online serving throughput.

View File

@@ -24,7 +24,7 @@ It is recommended to download the model weight to the shared directory of multip
`Qwen3-Coder` is first supported in `vllm-ascend:v0.10.0rc1`, please run this model using a later version.
You can using our official docker image to run `Qwen3-Coder-30B-A3B-Instruct` directly.
You can use our official docker image to run `Qwen3-Coder-30B-A3B-Instruct` directly.
```{code-block} bash
:substitutions:

View File

@@ -42,7 +42,7 @@ If you want to deploy multi-node environment, you need to verify multi-node comm
### Installation
You can using our official docker image for supporting Qwen3 Dense models.
You can use our official docker image for supporting Qwen3 Dense models.
Currently, we provide the all-in-one images.[Download images](https://quay.io/repository/ascend/vllm-ascend?tab=tags)
#### Docker Pull (by tag)
@@ -161,7 +161,7 @@ export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
export TASK_QUEUE_ENABLE=1
# [Optional] jemalloc
# jemalloc is for better performance, if `libjemalloc.so` is install on your machine, you can turn it on.
# jemalloc is for better performance, if `libjemalloc.so` is installed on your machine, you can turn it on.
# if os is Ubuntu
# export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
# if os is openEuler
@@ -291,7 +291,7 @@ Run performance evaluation of `Qwen3-32B-W8A8` as an example.
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
There are three `vllm bench` subcommand:
There are three `vllm bench` subcommands:
- `latency`: Benchmark the latency of a single batch of requests.
- `serve`: Benchmark the online serving throughput.

View File

@@ -154,7 +154,7 @@ Run performance evaluation of `Qwen3-Next` as an example.
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
There are three `vllm bench` subcommand:
There are three `vllm bench` subcommands:
- `latency`: Benchmark the latency of a single batch of requests.
- `serve`: Benchmark the online serving throughput.

View File

@@ -24,7 +24,7 @@ It is recommended to download the model weight to the shared directory of multip
:::::{tab-set}
::::{tab-item} Use docker image
You can using our official docker image to run Qwen3-Omni-30B-A3B-Thinking directly
You can use our official docker image to run Qwen3-Omni-30B-A3B-Thinking directly
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
@@ -269,7 +269,7 @@ Run performance evaluation of `Qwen3-Omni-30B-A3B-Thinking` as an example.
Refer to vllm benchmark for more details.
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
There are three `vllm bench` subcommand:
There are three `vllm bench` subcommands:
- `latency`: Benchmark the latency of a single batch of requests.
- `serve`: Benchmark the online serving throughput.

View File

@@ -260,7 +260,7 @@ Run performance evaluation of `Qwen3-VL-235B-A22B-Instruct` as an example.
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
There are three `vllm bench` subcommand:
There are three `vllm bench` subcommands:
- `latency`: Benchmark the latency of a single batch of requests.
- `serve`: Benchmark the online serving throughput.

View File

@@ -24,7 +24,7 @@ Refer to [verify multi-node communication environment](../installation.md#verify
### Installation
You can using our official docker image to run `DeepSeek-V3.1` directly.
You can use our official docker image to run `DeepSeek-V3.1` directly.
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
@@ -351,7 +351,7 @@ Run performance evaluation of `DeepSeek-V3.1-w8a8` as an example.
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
There are three `vllm bench` subcommand:
There are three `vllm bench` subcommands:
- `latency`: Benchmark the latency of a single batch of requests.
- `serve`: Benchmark the online serving throughput.

View File

@@ -159,7 +159,7 @@ Run performance evaluation of `Qwen3-235B-A22B-w8a8` as an example.
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
There are three `vllm bench` subcommand:
There are three `vllm bench` subcommands:
- `latency`: Benchmark the latency of a single batch of requests.
- `serve`: Benchmark the online serving throughput.