diff --git a/docs/source/tutorials/Qwen3-Dense.md b/docs/source/tutorials/Qwen3-Dense.md index 6717bf11..395f9dce 100644 --- a/docs/source/tutorials/Qwen3-Dense.md +++ b/docs/source/tutorials/Qwen3-Dense.md @@ -45,6 +45,7 @@ You can using our official docker image for supporting Qwen3 Dense models. Currently, we provide the all-in-one images.[Download images](https://quay.io/repository/ascend/vllm-ascend?tab=tags) #### Docker Pull (by tag) + ```{code-block} bash :substitutions: @@ -53,6 +54,7 @@ docker pull quay.io/ascend/vllm-ascend:|vllm_ascend_version| ``` #### Docker run + ```{code-block} bash :substitutions: @@ -344,7 +346,7 @@ The configuration compilation_config = { "cudagraph_mode": "FULL_DECODE_ONLY"} i ### 8. Asynchronous Scheduling Asynchronous scheduling is a technique used to optimize inference efficiency. It allows non-blocking task scheduling to improve concurrency and throughput, especially when processing large-scale models. -This optimization is enabled by setting `--async-scheduling`. +This optimization is enabled by setting `--async-scheduling`. ## Optimization Highlights