[Doc] fix the nit in docs (#6826)

Refresh the doc, fix the nit in the docs

- vLLM version: v0.15.0
- vLLM main:
83b47f67b1

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
wangxiyuan
2026-02-27 11:50:27 +08:00
committed by GitHub
parent 981d803cb7
commit a95c0b8b82
30 changed files with 145 additions and 118 deletions

View File

@@ -43,7 +43,7 @@ vllm serve Qwen/Qwen3-235B-A22 \
#### Initial Setup (Record Expert Map)
We need to add environment variable `export EXPERT_MAP_RECORD="true"` to record expert map.Generate the initial expert distribution map using expert_map_record_path. This creates a baseline configuration for future deployments.
We need to add environment variable `export EXPERT_MAP_RECORD="true"` to record expert map. Generate the initial expert distribution map using expert_map_record_path. This creates a baseline configuration for future deployments.
```shell
vllm serve Qwen/Qwen3-235B-A22 \

View File

@@ -22,7 +22,7 @@ This tutorial will introduce the usage of them.
pip install fastapi httpx uvicorn
```
## Starting Exeternal DP Servers
## Starting External DP Servers
First, you need to have at least two vLLM servers running in data parallel. These can be mock servers or actual vLLM servers. Note that this proxy also works with only one vLLM server running, but will fall back to direct request forwarding which is meaningless.

View File

@@ -267,10 +267,10 @@ Currently, the key-value pool in PD Disaggregate only stores the kv cache genera
"kv_connector": "AscendStoreConnector",
"kv_role": "kv_consumer",
"kv_connector_extra_config": {
"lookup_rpc_port":"0",
"backend": "mooncake"
"lookup_rpc_port": "0",
"backend": "mooncake",
"consumer_is_to_put": true,
"prefill_pp_size": 2
"prefill_pp_size": 2,
"prefill_pp_layer_partition": "30,31"
}
}

View File

@@ -164,7 +164,7 @@ vllm serve vllm-ascend/DeepSeek-R1-W8A8 \
"kv_parallel_size": "1",
"kv_port": "20001",
"engine_id": "0"
}'
}' \
--additional-config '{"enable_weight_nz_layout":true,"enable_prefill_optimizations":true}'
```

View File

@@ -8,10 +8,10 @@ You can refer to [Supported Models](https://docs.vllm.ai/en/latest/models/suppor
You can run LoRA with ACLGraph mode now. Please refer to [Graph Mode Guide](./graph_mode.md) for a better LoRA performance.
Address for downloading models:\
base model: <https://www.modelscope.cn/models/vllm-ascend/Llama-2-7b-hf/files> \
lora model:
<https://www.modelscope.cn/models/vllm-ascend/llama-2-7b-sql-lora-test/files>
Address for downloading models:
- base model: <https://www.modelscope.cn/models/vllm-ascend/Llama-2-7b-hf/files>
- lora model: <https://www.modelscope.cn/models/vllm-ascend/llama-2-7b-sql-lora-test/files>
## Example