[Doc][v0.18.0] Fix documentation formatting and improve code examples (#8701)

### What this PR does / why we need it?
This PR fixes various documentation issues and improves code examples
throughout the project.

Signed-off-by: MrZ20 <2609716663@qq.com>
This commit is contained in:
SILONG ZENG
2026-04-28 09:01:25 +08:00
committed by GitHub
parent 9a0b786f2b
commit 2e2aaa2fae
38 changed files with 205 additions and 188 deletions

View File

@@ -38,9 +38,9 @@ So far, dynamic batch performs better on several dense models including Qwen and
Dynamic batch is used in the online inference. A fully executable example is as follows:
```shell
SLO_LITMIT=50
SLO_LIMIT=50
vllm serve Qwen/Qwen2.5-14B-Instruct\
--additional_config '{"SLO_limits_for_dynamic_batch":'${SLO_LITMIT}'}' \
--additional_config '{"SLO_limits_for_dynamic_batch":'${SLO_LIMIT}'}' \
--max-num-seqs 256 \
--block-size 128 \
--tensor_parallel_size 8 \

View File

@@ -54,25 +54,25 @@ To enable Netloader, pass `--load-format=netloader` and provide configuration vi
### Server
```shell
VLLM_SLEEP_WHEN_IDLE=1 vllm serve `<model_file>` \
VLLM_SLEEP_WHEN_IDLE=1 vllm serve <model_file> \
--tensor-parallel-size 1 \
--served-model-name `<model_name>` \
--served-model-name <model_name> \
--enforce-eager \
--port `<port>` \
--port <port> \
--load-format netloader
```
### Client
```shell
export NETLOADER_CONFIG='{"SOURCE":[{"device_id":0, "sources": ["`<server_IP>`:`<server_Port>`"]}]}'
export NETLOADER_CONFIG='{"SOURCE":[{"device_id":0, "sources": ["<server_IP>:<server_Port>"]}]}'
VLLM_SLEEP_WHEN_IDLE=1 ASCEND_RT_VISIBLE_DEVICES=`<device_id_diff_from_server>` \
vllm serve `<model_file>` \
VLLM_SLEEP_WHEN_IDLE=1 ASCEND_RT_VISIBLE_DEVICES=<device_id_diff_from_server> \
vllm serve <model_file> \
--tensor-parallel-size 1 \
--served-model-name `<model_name>` \
--served-model-name <model_name> \
--enforce-eager \
--port `<client_port>` \
--port <client_port> \
--load-format netloader \
--model-loader-extra-config="${NETLOADER_CONFIG}"
```

View File

@@ -80,7 +80,7 @@ A simple planner implementation is provided at [`rfork_planner.py`](../../../../
```shell
python rfork_planner.py \
--host 0.0.0.0 \
--port `<planner_port>`
--port <planner_port>
```
### 3. Start vLLM Instances
@@ -93,15 +93,15 @@ For later instances, if the planner can allocate a compatible seed, RFork will t
```shell
export RFORK_CONFIG='{
"model_url": "`<model_url>`",
"model_deploy_strategy_name": "`<deploy_strategy>`",
"rfork_scheduler_url": "http://`<planner_ip>`:`<planner_port>`"
"model_url": "<model_url>",
"model_deploy_strategy_name": "<deploy_strategy>",
"rfork_scheduler_url": "http://<planner_ip>:<planner_port>"
}'
vllm serve `<model_path>` \
vllm serve <model_path> \
--tensor-parallel-size 1 \
--served-model-name `<served_model_name>` \
--port `<port>` \
--served-model-name <served_model_name> \
--port <port> \
--load-format rfork \
--model-loader-extra-config "${RFORK_CONFIG}"
```