[main][Docs] Fix spelling errors across documentation (#6649)
Fix various spelling mistakes in the project documentation to improve
clarity and correctness.
- vLLM version: v0.15.0
- vLLM main:
d7e17aaacd
---------
Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
This commit is contained in:
@@ -3,7 +3,7 @@
|
||||
## Introduction
|
||||
|
||||
DeepSeek-R1 is a high-performance Mixture-of-Experts (MoE) large language model developed by DeepSeek Company. It excels in complex logical reasoning, mathematical problem-solving, and code generation. By dynamically activating its expert networks, it delivers exceptional performance while maintaining computational efficiency. Building upon R1, DeepSeek-R1-W8A8 is a fully quantized version of the model. It employs 8-bit integer (INT8) quantization for both weights and activations, which significantly reduces the model's memory footprint and computational requirements, enabling more efficient deployment and application in resource-constrained environments.
|
||||
This article takes the deepseek- R1-W8A8 version as an example to introduce the deployment of the R1 series models.
|
||||
This article takes the `DeepSeek-R1-W8A8` version as an example to introduce the deployment of the R1 series models.
|
||||
|
||||
## Supported Features
|
||||
|
||||
@@ -25,7 +25,7 @@ If you want to deploy multi-node environment, you need to verify multi-node comm
|
||||
|
||||
### Installation
|
||||
|
||||
You can using our official docker image to run `DeepSeek-R1-W8A8` directly.
|
||||
You can use our official docker image to run `DeepSeek-R1-W8A8` directly.
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
|
||||
@@ -293,7 +293,7 @@ Run performance evaluation of `DeepSeek-R1-W8A8` as an example.
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
|
||||
There are three `vllm bench` subcommand:
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
- `latency`: Benchmark the latency of a single batch of requests.
|
||||
- `serve`: Benchmark the online serving throughput.
|
||||
|
||||
@@ -38,7 +38,7 @@ If you want to deploy multi-node environment, you need to verify multi-node comm
|
||||
|
||||
### Installation
|
||||
|
||||
You can using our official docker image to run `DeepSeek-V3.1` directly.
|
||||
You can use our official docker image to run `DeepSeek-V3.1` directly.
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
|
||||
@@ -96,7 +96,7 @@ nic_name="xxxx"
|
||||
local_ip="xxxx"
|
||||
|
||||
# [Optional] jemalloc
|
||||
# jemalloc is for better performance, if `libjemalloc.so` is install on your machine, you can turn it on.
|
||||
# jemalloc is for better performance, if `libjemalloc.so` is installed on your machine, you can turn it on.
|
||||
# export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
|
||||
|
||||
# AIV
|
||||
@@ -158,7 +158,7 @@ local_ip="xxxx"
|
||||
node0_ip="xxxx"
|
||||
|
||||
# [Optional] jemalloc
|
||||
# jemalloc is for better performance, if `libjemalloc.so` is install on your machine, you can turn it on.
|
||||
# jemalloc is for better performance, if `libjemalloc.so` is installed on your machine, you can turn it on.
|
||||
# export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
|
||||
|
||||
export HCCL_IF_IP=$local_ip
|
||||
@@ -210,7 +210,7 @@ local_ip="xxx"
|
||||
node0_ip="xxxx"
|
||||
|
||||
# [Optional] jemalloc
|
||||
# jemalloc is for better performance, if `libjemalloc.so` is install on your machine, you can turn it on.
|
||||
# jemalloc is for better performance, if `libjemalloc.so` is installed on your machine, you can turn it on.
|
||||
# export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
|
||||
|
||||
export HCCL_IF_IP=$local_ip
|
||||
@@ -275,7 +275,7 @@ local_ip="141.xx.xx.1"
|
||||
node0_ip="xxxx"
|
||||
|
||||
# [Optional] jemalloc
|
||||
# jemalloc is for better performance, if `libjemalloc.so` is install on your machine, you can turn it on.
|
||||
# jemalloc is for better performance, if `libjemalloc.so` is installed on your machine, you can turn it on.
|
||||
# export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
|
||||
|
||||
export HCCL_IF_IP=$local_ip
|
||||
@@ -352,7 +352,7 @@ local_ip="141.xx.xx.2"
|
||||
node0_ip="xxxx"
|
||||
|
||||
# [Optional] jemalloc
|
||||
# jemalloc is for better performance, if `libjemalloc.so` is install on your machine, you can turn it on.
|
||||
# jemalloc is for better performance, if `libjemalloc.so` is installed on your machine, you can turn it on.
|
||||
# export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
|
||||
|
||||
export HCCL_IF_IP=$local_ip
|
||||
@@ -429,7 +429,7 @@ local_ip="141.xx.xx.3"
|
||||
node0_ip="xxxx"
|
||||
|
||||
# [Optional] jemalloc
|
||||
# jemalloc is for better performance, if `libjemalloc.so` is install on your machine, you can turn it on.
|
||||
# jemalloc is for better performance, if `libjemalloc.so` is installed on your machine, you can turn it on.
|
||||
# export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
|
||||
|
||||
export HCCL_IF_IP=$local_ip
|
||||
@@ -505,7 +505,7 @@ local_ip="141.xx.xx.4"
|
||||
node0_ip="xxxx"
|
||||
|
||||
# [Optional] jemalloc
|
||||
# jemalloc is for better performance, if `libjemalloc.so` is install on your machine, you can turn it on.
|
||||
# jemalloc is for better performance, if `libjemalloc.so` is installed on your machine, you can turn it on.
|
||||
# export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
|
||||
|
||||
export HCCL_IF_IP=$local_ip
|
||||
@@ -707,7 +707,7 @@ Run performance evaluation of `DeepSeek-V3.1-w8a8-mtp-QuaRot` as an example.
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
|
||||
There are three `vllm bench` subcommand:
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
- `latency`: Benchmark the latency of a single batch of requests.
|
||||
- `serve`: Benchmark the online serving throughput.
|
||||
|
||||
@@ -29,7 +29,7 @@ If you want to deploy multi-node environment, you need to verify multi-node comm
|
||||
|
||||
### Installation
|
||||
|
||||
You can using our official docker image to run `DeepSeek-V3.2` directly..
|
||||
You can use our official docker image to run `DeepSeek-V3.2` directly.
|
||||
|
||||
:::::{tab-set}
|
||||
:sync-group: install
|
||||
@@ -456,11 +456,11 @@ Before you start, please
|
||||
dp_rpc_port = args.dp_rpc_port
|
||||
vllm_start_port = args.vllm_start_port
|
||||
|
||||
def run_command(visiable_devices, dp_rank, vllm_engine_port):
|
||||
def run_command(visible_devices, dp_rank, vllm_engine_port):
|
||||
command = [
|
||||
"bash",
|
||||
"./run_dp_template.sh",
|
||||
visiable_devices,
|
||||
visible_devices,
|
||||
str(vllm_engine_port),
|
||||
str(dp_size),
|
||||
str(dp_rank),
|
||||
@@ -481,9 +481,9 @@ Before you start, please
|
||||
for i in range(dp_size_local):
|
||||
dp_rank = dp_rank_start + i
|
||||
vllm_engine_port = vllm_start_port + i
|
||||
visiable_devices = ",".join(str(x) for x in range(i * tp_size, (i + 1) * tp_size))
|
||||
visible_devices = ",".join(str(x) for x in range(i * tp_size, (i + 1) * tp_size))
|
||||
process = multiprocessing.Process(target=run_command,
|
||||
args=(visiable_devices, dp_rank,
|
||||
args=(visible_devices, dp_rank,
|
||||
vllm_engine_port))
|
||||
processes.append(process)
|
||||
process.start()
|
||||
@@ -895,7 +895,7 @@ Run performance evaluation of `DeepSeek-V3.2-W8A8` as an example.
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
|
||||
There are three `vllm bench` subcommand:
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
- `latency`: Benchmark the latency of a single batch of requests.
|
||||
- `serve`: Benchmark the online serving throughput.
|
||||
|
||||
@@ -29,7 +29,7 @@ It is recommended to download the model weight to the shared directory of multip
|
||||
|
||||
### Installation
|
||||
|
||||
You can using our official docker image to run `GLM-4.x` directly.
|
||||
You can use our official docker image to run `GLM-4.x` directly.
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
|
||||
@@ -146,7 +146,7 @@ Run performance evaluation of `GLM-4.x` as an example.
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
|
||||
There are three `vllm bench` subcommand:
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
- `latency`: Benchmark the latency of a single batch of requests.
|
||||
- `serve`: Benchmark the online serving throughput.
|
||||
|
||||
@@ -22,7 +22,7 @@ It is recommended to download the model weights to a local directory (e.g., `./P
|
||||
|
||||
### Installation
|
||||
|
||||
You can using our official docker image to run `PaddleOCR-VL` directly.
|
||||
You can use our official docker image to run `PaddleOCR-VL` directly.
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
|
||||
|
||||
@@ -548,7 +548,7 @@ lm_eval \
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
|
||||
There are three `vllm bench` subcommand:
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
- `latency`: Benchmark the latency of a single batch of requests.
|
||||
- `serve`: Benchmark the online serving throughput.
|
||||
|
||||
@@ -24,7 +24,7 @@ It is recommended to download the model weights to a local directory (e.g., `./Q
|
||||
|
||||
### Installation
|
||||
|
||||
You can using our official docker image and install extra operator for supporting `Qwen2.5-7B-Instruct`.
|
||||
You can use our official docker image and install extra operator for supporting `Qwen2.5-7B-Instruct`.
|
||||
|
||||
:::::{tab-set}
|
||||
:sync-group: install
|
||||
@@ -158,7 +158,7 @@ Run performance evaluation of `Qwen2.5-7B-Instruct` as an example.
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
|
||||
There are three `vllm bench` subcommand:
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
- `latency`: Benchmark the latency of a single batch of requests.
|
||||
- `serve`: Benchmark the online serving throughput.
|
||||
|
||||
@@ -19,11 +19,11 @@ Refer to [feature guide](../user_guide/feature_guide/index.md) to get the featur
|
||||
- `Qwen2.5-Omni-3B`(BF16): [Download model weight](https://huggingface.co/Qwen/Qwen2.5-Omni-3B)
|
||||
- `Qwen2.5-Omni-7B`(BF16): [Download model weight](https://huggingface.co/Qwen/Qwen2.5-Omni-7B)
|
||||
|
||||
Following examples use the 7B version deafultly.
|
||||
Following examples use the 7B version by default.
|
||||
|
||||
### Installation
|
||||
|
||||
You can using our official docker image to run `Qwen2.5-Omni` directly.
|
||||
You can use our official docker image to run `Qwen2.5-Omni` directly.
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
|
||||
@@ -97,7 +97,7 @@ VLLM_TARGET_DEVICE=empty pip install -v ".[audio]"
|
||||
|
||||
`--allowed-local-media-path` is optional, only set it if you need infer model with local media file
|
||||
|
||||
`--gpu-memory-utilization` should not be set manually only if yous know what this parameter aims to.
|
||||
`--gpu-memory-utilization` should not be set manually only if you know what this parameter aims to.
|
||||
|
||||
#### Multiple NPU (Qwen2.5-Omni-7B)
|
||||
|
||||
@@ -195,7 +195,7 @@ Run performance evaluation of `Qwen2.5-Omni-7B` as an example.
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
|
||||
There are three `vllm bench` subcommand:
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
- `latency`: Benchmark the latency of a single batch of requests.
|
||||
- `serve`: Benchmark the online serving throughput.
|
||||
|
||||
@@ -300,7 +300,7 @@ Run performance evaluation of `Qwen3-235B-A22B-w8a8` as an example.
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
|
||||
There are three `vllm bench` subcommand:
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
- `latency`: Benchmark the latency of a single batch of requests.
|
||||
- `serve`: Benchmark the online serving throughput.
|
||||
|
||||
@@ -24,7 +24,7 @@ It is recommended to download the model weight to the shared directory of multip
|
||||
|
||||
`Qwen3-Coder` is first supported in `vllm-ascend:v0.10.0rc1`, please run this model using a later version.
|
||||
|
||||
You can using our official docker image to run `Qwen3-Coder-30B-A3B-Instruct` directly.
|
||||
You can use our official docker image to run `Qwen3-Coder-30B-A3B-Instruct` directly.
|
||||
|
||||
```{code-block} bash
|
||||
:substitutions:
|
||||
|
||||
@@ -42,7 +42,7 @@ If you want to deploy multi-node environment, you need to verify multi-node comm
|
||||
|
||||
### Installation
|
||||
|
||||
You can using our official docker image for supporting Qwen3 Dense models.
|
||||
You can use our official docker image for supporting Qwen3 Dense models.
|
||||
Currently, we provide the all-in-one images.[Download images](https://quay.io/repository/ascend/vllm-ascend?tab=tags)
|
||||
|
||||
#### Docker Pull (by tag)
|
||||
@@ -161,7 +161,7 @@ export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
|
||||
export TASK_QUEUE_ENABLE=1
|
||||
|
||||
# [Optional] jemalloc
|
||||
# jemalloc is for better performance, if `libjemalloc.so` is install on your machine, you can turn it on.
|
||||
# jemalloc is for better performance, if `libjemalloc.so` is installed on your machine, you can turn it on.
|
||||
# if os is Ubuntu
|
||||
# export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
|
||||
# if os is openEuler
|
||||
@@ -291,7 +291,7 @@ Run performance evaluation of `Qwen3-32B-W8A8` as an example.
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
|
||||
There are three `vllm bench` subcommand:
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
- `latency`: Benchmark the latency of a single batch of requests.
|
||||
- `serve`: Benchmark the online serving throughput.
|
||||
|
||||
@@ -154,7 +154,7 @@ Run performance evaluation of `Qwen3-Next` as an example.
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
|
||||
There are three `vllm bench` subcommand:
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
- `latency`: Benchmark the latency of a single batch of requests.
|
||||
- `serve`: Benchmark the online serving throughput.
|
||||
|
||||
@@ -24,7 +24,7 @@ It is recommended to download the model weight to the shared directory of multip
|
||||
:::::{tab-set}
|
||||
::::{tab-item} Use docker image
|
||||
|
||||
You can using our official docker image to run Qwen3-Omni-30B-A3B-Thinking directly
|
||||
You can use our official docker image to run Qwen3-Omni-30B-A3B-Thinking directly
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
|
||||
@@ -269,7 +269,7 @@ Run performance evaluation of `Qwen3-Omni-30B-A3B-Thinking` as an example.
|
||||
Refer to vllm benchmark for more details.
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
|
||||
There are three `vllm bench` subcommand:
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
- `latency`: Benchmark the latency of a single batch of requests.
|
||||
- `serve`: Benchmark the online serving throughput.
|
||||
|
||||
@@ -260,7 +260,7 @@ Run performance evaluation of `Qwen3-VL-235B-A22B-Instruct` as an example.
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
|
||||
There are three `vllm bench` subcommand:
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
- `latency`: Benchmark the latency of a single batch of requests.
|
||||
- `serve`: Benchmark the online serving throughput.
|
||||
|
||||
@@ -24,7 +24,7 @@ Refer to [verify multi-node communication environment](../installation.md#verify
|
||||
|
||||
### Installation
|
||||
|
||||
You can using our official docker image to run `DeepSeek-V3.1` directly.
|
||||
You can use our official docker image to run `DeepSeek-V3.1` directly.
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
|
||||
@@ -351,7 +351,7 @@ Run performance evaluation of `DeepSeek-V3.1-w8a8` as an example.
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
|
||||
There are three `vllm bench` subcommand:
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
- `latency`: Benchmark the latency of a single batch of requests.
|
||||
- `serve`: Benchmark the online serving throughput.
|
||||
|
||||
@@ -159,7 +159,7 @@ Run performance evaluation of `Qwen3-235B-A22B-w8a8` as an example.
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
|
||||
There are three `vllm bench` subcommand:
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
- `latency`: Benchmark the latency of a single batch of requests.
|
||||
- `serve`: Benchmark the online serving throughput.
|
||||
|
||||
Reference in New Issue
Block a user