[Doc] Add qwen3 embedding 8b guide (#1734)
1. Add the tutorials for qwen3-embedding-8b
2. Remove VLLM_USE_V1=1 in docs, it's useless any more from 0.9.2
- vLLM version: v0.9.2
- vLLM main:
5923ab9524
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
@@ -24,8 +24,6 @@ import os
|
||||
|
||||
from vllm import LLM
|
||||
|
||||
os.environ["VLLM_USE_V1"] = "1"
|
||||
|
||||
model = LLM(model="Qwen/Qwen2-7B-Instruct")
|
||||
outputs = model.generate("Hello, how are you?")
|
||||
```
|
||||
@@ -46,8 +44,6 @@ offline example:
|
||||
import os
|
||||
from vllm import LLM
|
||||
|
||||
os.environ["VLLM_USE_V1"] = "1"
|
||||
|
||||
# TorchAirGraph is only work without chunked-prefill now
|
||||
model = LLM(model="deepseek-ai/DeepSeek-R1-0528", additional_config={"torchair_graph_config": {"enabled": True},"ascend_scheduler_config": {"enabled": True,}})
|
||||
outputs = model.generate("Hello, how are you?")
|
||||
@@ -71,8 +67,6 @@ offline example:
|
||||
import os
|
||||
from vllm import LLM
|
||||
|
||||
os.environ["VLLM_USE_V1"] = "1"
|
||||
|
||||
model = LLM(model="someother_model_weight", enforce_eager=True)
|
||||
outputs = model.generate("Hello, how are you?")
|
||||
```
|
||||
|
||||
@@ -40,7 +40,6 @@ The following is a simple example of how to use sleep mode.
|
||||
from vllm.utils import GiB_bytes
|
||||
|
||||
|
||||
os.environ["VLLM_USE_V1"] = "1"
|
||||
os.environ["VLLM_USE_MODELSCOPE"] = "True"
|
||||
os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"
|
||||
|
||||
@@ -77,7 +76,6 @@ The following is a simple example of how to use sleep mode.
|
||||
|
||||
```bash
|
||||
export VLLM_SERVER_DEV_MODE="1"
|
||||
export VLLM_USE_V1="1"
|
||||
export VLLM_WORKER_MULTIPROC_METHOD="spawn"
|
||||
export VLLM_USE_MODELSCOPE="True"
|
||||
|
||||
|
||||
Reference in New Issue
Block a user