[Doc][Misc] Correcting the document and uploading the model deployment template (#8287)
<!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? Correcting the document and uploading the model deployment template ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? --------- Signed-off-by: herizhen <1270637059@qq.com> Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
This commit is contained in:
@@ -327,8 +327,6 @@ The parameters are explained as follows:
|
||||
|
||||
## Accuracy Evaluation
|
||||
|
||||
Here are two accuracy evaluation methods.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
|
||||
@@ -135,8 +135,6 @@ The parameters are explained as follows:
|
||||
|
||||
## Accuracy Evaluation
|
||||
|
||||
Here are two accuracy evaluation methods.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
|
||||
@@ -240,12 +240,12 @@ If you occasionally see `zmq.error.ZMQError: Address already in use` during star
|
||||
### launch_online_dp.py
|
||||
|
||||
Use `launch_online_dp.py` to launch external dp vllm servers.
|
||||
[launch\_online\_dp.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/launch_online_dp.py)
|
||||
[launch_online_dp.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/launch_online_dp.py)
|
||||
|
||||
### run_dp_template.sh
|
||||
|
||||
Modify `run_dp_template.sh` on each node.
|
||||
[run\_dp\_template.sh](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/run_dp_template.sh)
|
||||
[run_dp_template.sh](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/run_dp_template.sh)
|
||||
|
||||
#### Layerwise
|
||||
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
# Prefill-Decode Disaggregation (Qwen2.5-VL)
|
||||
|
||||
## Getting Start
|
||||
## Getting Started
|
||||
|
||||
vLLM-Ascend now supports prefill-decode (PD) disaggregation. This guide takes one-by-one steps to verify these features with constrained resources.
|
||||
|
||||
Using the Qwen2.5-VL-7B-Instruct model as an example, use vllm-ascend v0.11.0rc1 (with vLLM v0.11.0) on 1 Atlas 800T A2 server to deploy the "1P1D" architecture. Assume the IP address is 192.0.0.1.
|
||||
Using the Qwen2.5-VL-7B-Instruct model as an example, use vLLM-Ascend v0.11.0rc1 (with vLLM v0.11.0) on 1 Atlas 800T A2 server to deploy the "1P1D" architecture. Assume the IP address is 192.0.0.1.
|
||||
|
||||
## Verify Communication Environment
|
||||
|
||||
|
||||
@@ -133,7 +133,7 @@ models = [
|
||||
|
||||
```bash
|
||||
# Example command to test gsm8k dataset performance using the first 100 prompts. Commands for other datasets are similar.
|
||||
ais_bench --models vllm_api_stream_chat \
|
||||
ais_bench --models vllm-api-stream-chat \
|
||||
--datasets gsm8k_gen_0_shot_cot_str_perf \
|
||||
--debug --summarizer default_perf --mode perf --num-prompts 100
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user