[Doc][v0.18.0] Fix documentation formatting and improve code examples (#8701)
### What this PR does / why we need it? This PR fixes various documentation issues and improves code examples throughout the project. Signed-off-by: MrZ20 <2609716663@qq.com>
This commit is contained in:
@@ -158,6 +158,12 @@ Scheduling optimization:
|
||||
:substitutions:
|
||||
# Optimize operator delivery queue. This will affect the memory peak value, and may degrade if the memory is tight.
|
||||
export TASK_QUEUE_ENABLE=2
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```{code-block} bash
|
||||
:substitutions:
|
||||
|
||||
# This will greatly improve the CPU bottleneck model and ensure the same performance for the NPU bottleneck model.
|
||||
export CPU_AFFINITY_CONF=1
|
||||
|
||||
@@ -223,7 +223,7 @@ vllm serve Qwen/Qwen3-Embedding-8B --trust-remote-code
|
||||
```shell
|
||||
# download dataset
|
||||
# wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
|
||||
export VLLM_USE_MODELSCOPE=true
|
||||
export VLLM_USE_MODELSCOPE=True
|
||||
vllm bench serve \
|
||||
--model Qwen/Qwen3-Embedding-8B \
|
||||
--backend openai-embeddings \
|
||||
|
||||
Reference in New Issue
Block a user