[CI] fix lint (#4888)
Fix lint CI error Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
@@ -45,6 +45,7 @@ You can using our official docker image for supporting Qwen3 Dense models.
|
||||
Currently, we provide the all-in-one images.[Download images](https://quay.io/repository/ascend/vllm-ascend?tab=tags)
|
||||
|
||||
#### Docker Pull (by tag)
|
||||
|
||||
```{code-block} bash
|
||||
:substitutions:
|
||||
|
||||
@@ -53,6 +54,7 @@ docker pull quay.io/ascend/vllm-ascend:|vllm_ascend_version|
|
||||
```
|
||||
|
||||
#### Docker run
|
||||
|
||||
```{code-block} bash
|
||||
:substitutions:
|
||||
|
||||
@@ -344,7 +346,7 @@ The configuration compilation_config = { "cudagraph_mode": "FULL_DECODE_ONLY"} i
|
||||
### 8. Asynchronous Scheduling
|
||||
Asynchronous scheduling is a technique used to optimize inference efficiency. It allows non-blocking task scheduling to improve concurrency and throughput, especially when processing large-scale models.
|
||||
|
||||
This optimization is enabled by setting `--async-scheduling`.
|
||||
This optimization is enabled by setting `--async-scheduling`.
|
||||
|
||||
## Optimization Highlights
|
||||
|
||||
|
||||
Reference in New Issue
Block a user