[Doc][Misc] Correcting the document and uploading the model deployment template (#8287)

### What this PR does / why we need it? Correcting the document and uploading the model deployment template ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? --------- Signed-off-by: herizhen <1270637059@qq.com> Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
2026-04-15 16:03:11 +08:00
parent 147b589f62
commit 95726d20eb
31 changed files with 536 additions and 308 deletions
--- a/docs/source/tutorials/features/long_sequence_context_parallel_multi_node.md
+++ b/docs/source/tutorials/features/long_sequence_context_parallel_multi_node.md
@@ -327,8 +327,6 @@ The parameters are explained as follows:

 ## Accuracy Evaluation

-Here are two accuracy evaluation methods.
-
 ### Using AISBench

 1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
--- a/docs/source/tutorials/features/long_sequence_context_parallel_single_node.md
+++ b/docs/source/tutorials/features/long_sequence_context_parallel_single_node.md
@@ -135,8 +135,6 @@ The parameters are explained as follows:

 ## Accuracy Evaluation

-Here are two accuracy evaluation methods.
-
 ### Using AISBench

 1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
--- a/docs/source/tutorials/features/pd_disaggregation_mooncake_multi_node.md
+++ b/docs/source/tutorials/features/pd_disaggregation_mooncake_multi_node.md
@@ -240,12 +240,12 @@ If you occasionally see `zmq.error.ZMQError: Address already in use` during star
 ### launch_online_dp.py

 Use `launch_online_dp.py` to launch external dp vllm servers.
-[launch\_online\_dp.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/launch_online_dp.py)
+[launch_online_dp.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/launch_online_dp.py)

 ### run_dp_template.sh

 Modify `run_dp_template.sh` on each node.
-[run\_dp\_template.sh](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/run_dp_template.sh)
+[run_dp_template.sh](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/run_dp_template.sh)

 #### Layerwise

--- a/docs/source/tutorials/features/pd_disaggregation_mooncake_single_node.md
+++ b/docs/source/tutorials/features/pd_disaggregation_mooncake_single_node.md
@@ -1,10 +1,10 @@
 # Prefill-Decode Disaggregation (Qwen2.5-VL)

-## Getting Start
+## Getting Started

 vLLM-Ascend now supports prefill-decode (PD) disaggregation. This guide takes one-by-one steps to verify these features with constrained resources.

-Using the Qwen2.5-VL-7B-Instruct model as an example, use vllm-ascend v0.11.0rc1 (with vLLM v0.11.0) on 1 Atlas 800T A2 server to deploy the "1P1D" architecture. Assume the IP address is 192.0.0.1.
+Using the Qwen2.5-VL-7B-Instruct model as an example, use vLLM-Ascend v0.11.0rc1 (with vLLM v0.11.0) on 1 Atlas 800T A2 server to deploy the "1P1D" architecture. Assume the IP address is 192.0.0.1.

 ## Verify Communication Environment

--- a/docs/source/tutorials/features/suffix_speculative_decoding.md
+++ b/docs/source/tutorials/features/suffix_speculative_decoding.md
@@ -133,7 +133,7 @@ models = [

 ```bash
 # Example command to test gsm8k dataset performance using the first 100 prompts. Commands for other datasets are similar.
-ais_bench --models vllm_api_stream_chat \
+ais_bench --models vllm-api-stream-chat \
  --datasets gsm8k_gen_0_shot_cot_str_perf \
  --debug --summarizer default_perf --mode perf --num-prompts 100
 ```