diff --git a/docs/source/tutorials/single_npu_multimodal.md b/docs/source/tutorials/single_npu_multimodal.md index 3b01397..58cfb57 100644 --- a/docs/source/tutorials/single_npu_multimodal.md +++ b/docs/source/tutorials/single_npu_multimodal.md @@ -29,6 +29,9 @@ docker run --rm \ Setup environment variables: ```bash +# Use vllm v1 engine +export VLLM_USE_V1=1 + # Load model from ModelScope to speed up download export VLLM_USE_MODELSCOPE=True @@ -57,6 +60,7 @@ llm = LLM( model=MODEL_PATH, max_model_len=16384, limit_mm_per_prompt={"image": 10}, + enforce_eager=True, ) sampling_params = SamplingParams( @@ -103,13 +107,11 @@ outputs = llm.generate([llm_inputs], sampling_params=sampling_params) generated_text = outputs[0].outputs[0].text print(generated_text) - ``` If you run this script successfully, you can see the info shown below: ```bash -Processed prompts: 100%|███████████████| 1/1 [00:11<00:00, 11.29s/it, est. speed input: 9.48 toks/s, output: 20.55 toks/s] The image displays a logo consisting of two main elements: a stylized geometric design and a pair of text elements. 1. **Geometric Design**: On the left side of the image, there is a blue geometric design that appears to be made up of interconnected shapes. These shapes resemble a network or a complex polygonal structure, possibly hinting at a technological or interconnected theme. The design is monochromatic and uses only blue as its color, which could be indicative of a specific brand or company. @@ -141,10 +143,15 @@ docker run --rm \ -v /etc/ascend_install.info:/etc/ascend_install.info \ -v /root/.cache:/root/.cache \ -p 8000:8000 \ +-e VLLM_USE_V1=1 \ -e VLLM_USE_MODELSCOPE=True \ -e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \ -it $IMAGE \ -vllm serve Qwen/Qwen2.5-VL-7B-Instruct --dtype bfloat16 --max_model_len 16384 --max-num-batched-tokens 16384 +vllm serve Qwen/Qwen2.5-VL-7B-Instruct \ +--dtype bfloat16 \ +--max_model_len 16384 \ +--max-num-batched-tokens 16384 \ +--enforce-eager ``` :::{note}