diff --git a/docs/source/developer_guide/contribution/testing.md b/docs/source/developer_guide/contribution/testing.md index 8dbf3b30..c87fc717 100644 --- a/docs/source/developer_guide/contribution/testing.md +++ b/docs/source/developer_guide/contribution/testing.md @@ -23,7 +23,7 @@ cd ~/vllm-project/ # vllm vllm-ascend # Use mirror to speed up download -# docker pull quay.nju.edu.cn/ascend/cann:|cann_image_tag| +# docker pull m.daocloud.io/quay.io/ascend/cann:|cann_image_tag| export IMAGE=quay.io/ascend/cann:|cann_image_tag| docker run --rm --name vllm-ascend-ut \ -v $(pwd):/vllm-project \ diff --git a/docs/source/developer_guide/performance_and_debug/optimization_and_tuning.md b/docs/source/developer_guide/performance_and_debug/optimization_and_tuning.md index 6aad04db..a5786e67 100644 --- a/docs/source/developer_guide/performance_and_debug/optimization_and_tuning.md +++ b/docs/source/developer_guide/performance_and_debug/optimization_and_tuning.md @@ -55,7 +55,7 @@ pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple pip install modelscope pandas datasets gevent sacrebleu rouge_score pybind11 pytest # Configure this var to speed up model download -VLLM_USE_MODELSCOPE=true +export VLLM_USE_MODELSCOPE=True ``` Please follow the [Installation Guide](https://docs.vllm.ai/projects/ascend/en/latest/installation.html) to make sure vLLM and vllm-ascend are installed correctly. @@ -85,7 +85,7 @@ wget https://repo.oepkgs.net/ascend/pytorch/vllm/python/py311_bisheng.tar.gz # Configure python and pip cp ./*.so* /usr/local/lib -tar -zxvf ./py311_bisheng.* -C /usr/local/ +tar -zxvf ./py311_bisheng.tar.gz -C /usr/local/ mv /usr/local/py311_bisheng/ /usr/local/python sed -i "1c#\!/usr/local/python/bin/python3.11" /usr/local/python/bin/pip3 sed -i "1c#\!/usr/local/python/bin/python3.11" /usr/local/python/bin/pip3.11 diff --git a/docs/source/developer_guide/performance_and_debug/performance_benchmark.md b/docs/source/developer_guide/performance_and_debug/performance_benchmark.md index 373dc2ec..21d0271f 100644 --- a/docs/source/developer_guide/performance_and_debug/performance_benchmark.md +++ b/docs/source/developer_guide/performance_and_debug/performance_benchmark.md @@ -159,7 +159,7 @@ vllm bench throughput \ If successful, you will see the following output ```shell -Processed prompts: 100%|█| 10/10 [00:03<00:00, 2.74it/s, est. speed input: 351.02 toks/s, output: 351.02 toks/s +Processed prompts: 100%|█| 10/10 [00:03<00:00, 2.74it/s, est. speed input: 351.02 toks/s, output: 351.02 toks/s] Throughput: 2.73 requests/s, 699.93 total tokens/s, 349.97 output tokens/s Total num prompt tokens: 1280 Total num output tokens: 1280 diff --git a/docs/source/user_guide/feature_guide/dynamic_batch.md b/docs/source/user_guide/feature_guide/dynamic_batch.md index 8b7855b1..ca22a0b8 100644 --- a/docs/source/user_guide/feature_guide/dynamic_batch.md +++ b/docs/source/user_guide/feature_guide/dynamic_batch.md @@ -24,9 +24,9 @@ We are working on further improvements and this feature will support more XPUs i `--SLO_limits_for_dynamic_batch` is the tuning parameter (integer type) for the dynamic batch feature, larger values relax latency limitation, leading to higher effective throughput. The parameter can be selected according to the specific models or service requirements. ```python ---SLO_limits_for_dynamic_batch =-1 # default value, dynamic batch disabled. ---SLO_limits_for_dynamic_batch = 0 # baseline value for dynamic batch, dynamic batch disabled, FCFS and decode-first chunked prefilling strategy is used. ---SLO_limits_for_dynamic_batch > 0 # user-defined value for dynamic batch, dynamic batch enabled with FCFS and decode-first chunked prefilling strategy. +--SLO_limits_for_dynamic_batch = -1 # Default value; dynamic batching is disabled. +--SLO_limits_for_dynamic_batch = 0 # Baseline value for dynamic batching; dynamic batching is disabled. FCFS and decode-first chunked prefilling strategy is used. +--SLO_limits_for_dynamic_batch > 0 # User-defined positive value; dynamic batching is enabled. FCFS and decode-first chunked prefilling strategy is used. ``` ### Supported Models diff --git a/docs/source/user_guide/support_matrix/supported_features.md b/docs/source/user_guide/support_matrix/supported_features.md index 835b1995..753070aa 100644 --- a/docs/source/user_guide/support_matrix/supported_features.md +++ b/docs/source/user_guide/support_matrix/supported_features.md @@ -36,7 +36,7 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th - 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs). - 🔴 NO plan/Deprecated: No plan or deprecated by vLLM. -[v1_user_guide]: https://docs.vllm.ai/en/latest/getting_started/v1_user_guide.html +[v1_user_guide]: https://docs.vllm.ai/en/latest/usage/v1_guide/ [multimodal]: https://docs.vllm.ai/projects/ascend/en/latest/tutorials/models/Qwen-VL-Dense.html [guided_decoding]: https://github.com/vllm-project/vllm-ascend/issues/177 [LoRA]: https://docs.vllm.ai/projects/ascend/en/latest/user_guide/feature_guide/lora.html