[Doc][Misc] Correcting the document and uploading the model deployment template (#8287)

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
Correcting the document and uploading the model deployment template

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

---------

Signed-off-by: herizhen <1270637059@qq.com>
Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
This commit is contained in:
herizhen
2026-04-15 16:03:11 +08:00
committed by GitHub
parent 147b589f62
commit 95726d20eb
31 changed files with 536 additions and 308 deletions

View File

@@ -327,8 +327,6 @@ The parameters are explained as follows:
## Accuracy Evaluation
Here are two accuracy evaluation methods.
### Using AISBench
1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.

View File

@@ -135,8 +135,6 @@ The parameters are explained as follows:
## Accuracy Evaluation
Here are two accuracy evaluation methods.
### Using AISBench
1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.

View File

@@ -240,12 +240,12 @@ If you occasionally see `zmq.error.ZMQError: Address already in use` during star
### launch_online_dp.py
Use `launch_online_dp.py` to launch external dp vllm servers.
[launch\_online\_dp.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/launch_online_dp.py)
[launch_online_dp.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/launch_online_dp.py)
### run_dp_template.sh
Modify `run_dp_template.sh` on each node.
[run\_dp\_template.sh](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/run_dp_template.sh)
[run_dp_template.sh](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/run_dp_template.sh)
#### Layerwise

View File

@@ -1,10 +1,10 @@
# Prefill-Decode Disaggregation (Qwen2.5-VL)
## Getting Start
## Getting Started
vLLM-Ascend now supports prefill-decode (PD) disaggregation. This guide takes one-by-one steps to verify these features with constrained resources.
Using the Qwen2.5-VL-7B-Instruct model as an example, use vllm-ascend v0.11.0rc1 (with vLLM v0.11.0) on 1 Atlas 800T A2 server to deploy the "1P1D" architecture. Assume the IP address is 192.0.0.1.
Using the Qwen2.5-VL-7B-Instruct model as an example, use vLLM-Ascend v0.11.0rc1 (with vLLM v0.11.0) on 1 Atlas 800T A2 server to deploy the "1P1D" architecture. Assume the IP address is 192.0.0.1.
## Verify Communication Environment

View File

@@ -133,7 +133,7 @@ models = [
```bash
# Example command to test gsm8k dataset performance using the first 100 prompts. Commands for other datasets are similar.
ais_bench --models vllm_api_stream_chat \
ais_bench --models vllm-api-stream-chat \
--datasets gsm8k_gen_0_shot_cot_str_perf \
--debug --summarizer default_perf --mode perf --num-prompts 100
```