[Doc] fix the nit in docs (#6826)
Refresh the doc, fix the nit in the docs
- vLLM version: v0.15.0
- vLLM main:
83b47f67b1
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
@@ -43,7 +43,7 @@ vllm serve Qwen/Qwen3-235B-A22 \
|
||||
|
||||
#### Initial Setup (Record Expert Map)
|
||||
|
||||
We need to add environment variable `export EXPERT_MAP_RECORD="true"` to record expert map.Generate the initial expert distribution map using expert_map_record_path. This creates a baseline configuration for future deployments.
|
||||
We need to add environment variable `export EXPERT_MAP_RECORD="true"` to record expert map. Generate the initial expert distribution map using expert_map_record_path. This creates a baseline configuration for future deployments.
|
||||
|
||||
```shell
|
||||
vllm serve Qwen/Qwen3-235B-A22 \
|
||||
|
||||
@@ -22,7 +22,7 @@ This tutorial will introduce the usage of them.
|
||||
pip install fastapi httpx uvicorn
|
||||
```
|
||||
|
||||
## Starting Exeternal DP Servers
|
||||
## Starting External DP Servers
|
||||
|
||||
First, you need to have at least two vLLM servers running in data parallel. These can be mock servers or actual vLLM servers. Note that this proxy also works with only one vLLM server running, but will fall back to direct request forwarding which is meaningless.
|
||||
|
||||
|
||||
@@ -267,10 +267,10 @@ Currently, the key-value pool in PD Disaggregate only stores the kv cache genera
|
||||
"kv_connector": "AscendStoreConnector",
|
||||
"kv_role": "kv_consumer",
|
||||
"kv_connector_extra_config": {
|
||||
"lookup_rpc_port":"0",
|
||||
"backend": "mooncake"
|
||||
"lookup_rpc_port": "0",
|
||||
"backend": "mooncake",
|
||||
"consumer_is_to_put": true,
|
||||
"prefill_pp_size": 2
|
||||
"prefill_pp_size": 2,
|
||||
"prefill_pp_layer_partition": "30,31"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -164,7 +164,7 @@ vllm serve vllm-ascend/DeepSeek-R1-W8A8 \
|
||||
"kv_parallel_size": "1",
|
||||
"kv_port": "20001",
|
||||
"engine_id": "0"
|
||||
}'
|
||||
}' \
|
||||
--additional-config '{"enable_weight_nz_layout":true,"enable_prefill_optimizations":true}'
|
||||
```
|
||||
|
||||
|
||||
@@ -8,10 +8,10 @@ You can refer to [Supported Models](https://docs.vllm.ai/en/latest/models/suppor
|
||||
|
||||
You can run LoRA with ACLGraph mode now. Please refer to [Graph Mode Guide](./graph_mode.md) for a better LoRA performance.
|
||||
|
||||
Address for downloading models:\
|
||||
base model: <https://www.modelscope.cn/models/vllm-ascend/Llama-2-7b-hf/files> \
|
||||
lora model:
|
||||
<https://www.modelscope.cn/models/vllm-ascend/llama-2-7b-sql-lora-test/files>
|
||||
Address for downloading models:
|
||||
|
||||
- base model: <https://www.modelscope.cn/models/vllm-ascend/Llama-2-7b-hf/files>
|
||||
- lora model: <https://www.modelscope.cn/models/vllm-ascend/llama-2-7b-sql-lora-test/files>
|
||||
|
||||
## Example
|
||||
|
||||
|
||||
Reference in New Issue
Block a user