xc-llm-ascend/docs/source/tutorials/models/PaddleOCR-VL.md

# PaddleOCR-VL

## Introduction

PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition.

This document provides a detailed workflow for the complete deployment and verification of the model, including supported features, environment preparation, single-node deployment, and functional verification. It is designed to help users quickly complete model deployment and verification.

## Supported Features

Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.

Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.

## Environment Preparation

### Model Weight

* `PaddleOCR-VL-0.9B`: [PaddleOCR-VL-0.9B](https://www.modelscope.cn/models/PaddlePaddle/PaddleOCR-VL)

It is recommended to download the model weights to a local directory (e.g., `./PaddleOCR-VL`) for quick access during deployment.

### Installation

You can use our official docker image to run `PaddleOCR-VL` directly.

Select an image based on your machine type and start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).

```{code-block} bash
   :substitutions:
export IMAGE=quay.io/ascend/vllm-ascend:v0.13.0rc1
docker run --rm \
    --name vllm-ascend \
    --shm-size=1g \
    --net=host \
    --device /dev/davinci0 \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v /root/.cache:/root/.cache \
    -it $IMAGE bash
```

:::{note}
The 310P device is supported from version 0.15.0rc1. You need to select the corresponding image for installation.
:::

## Deployment

### Single-node Deployment

#### Single NPU (PaddleOCR-VL)

PaddleOCR-VL supports single-node single-card deployment on the 910B4 and 310P platform. Follow these steps to start the inference service:

1. Prepare model weights: Ensure the downloaded model weights are stored in the `PaddleOCR-VL` directory.
2. Create and execute the deployment script (save as `deploy.sh`):

:::::{tab-set}
:sync-group: install

::::{tab-item} 910B4
:sync: 910B4

Run the following script to start the vLLM server on single 910B4:

```shell
#!/bin/sh
export VLLM_USE_MODELSCOPE=true
export MODEL_PATH="PaddlePaddle/PaddleOCR-VL"
export TASK_QUEUE_ENABLE=1
export CPU_AFFINITY_CONF=1
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"

vllm serve ${MODEL_PATH} \
          --max-num-batched-tokens 16384 \
          --served-model-name PaddleOCR-VL-0.9B \
          --trust-remote-code \
          --no-enable-prefix-caching \
          --mm-processor-cache-gb 0 \
          --compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}' \
          --additional_config '{"enable_cpu_binding":true}' \
          --port 8000
```

::::
::::{tab-item} 310P
:sync: 310P

Run the following script to start the vLLM server on single 310P:

```shell
#!/bin/sh
export VLLM_USE_MODELSCOPE=true
export MODEL_PATH="PaddlePaddle/PaddleOCR-VL"

vllm serve ${MODEL_PATH} \
          --max_model_len 16384 \
          --served-model-name PaddleOCR-VL-0.9B \
          --trust-remote-code \
          --no-enable-prefix-caching \
          --mm-processor-cache-gb 0 \
          --enforce-eager \
          --dtype float16 \
          --port 8000
```

:::{note}
The `--max_model_len` option is added to prevent errors when generating the attention operator mask on the 310P device.
:::

::::
:::::

#### Multiple NPU (PaddleOCR-VL)

Single-node deployment is recommended.

### Prefill-Decode Disaggregation

Not supported yet.

## Functional Verification

If your service start successfully, you can see the info shown below:

```bash
INFO:     Started server process [87471]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
```

Once your server is started, you can use the OpenAI API client to make queries.

```python
from openai import OpenAI

client = OpenAI(
    api_key="EMPTY",
    base_url="http://localhost:8000/v1",
    timeout=3600
)

# Task-specific base prompts
TASKS = {
    "ocr": "OCR:",
    "table": "Table Recognition:",
    "formula": "Formula Recognition:",
    "chart": "Chart Recognition:",
}

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
                }
            },
            {
                "type": "text",
                "text": TASKS["ocr"]
            }
        ]
    }
]

response = client.chat.completions.create(
    model="PaddleOCR-VL-0.9B",
    messages=messages,
    temperature=0.0,
)
print(f"Generated text: {response.choices[0].message.content}")
```

If you query the server successfully, you can see the info shown below (client):

```bash
Generated text: CINNAMON SUGAR
1 x 17,000
17,000
SUB TOTAL
17,000
GRAND TOTAL
17,000
CASH IDR
20,000
CHANGE DUE
3,000
```

## Offline Inference with vLLM and PP-DocLayoutV2

In the above example, we demonstrated how to use vLLM to infer the PaddleOCR-VL-0.9B model. Typically, we also need to integrate the PP-DocLayoutV2 model to fully unleash the capabilities of the PaddleOCR-VL model, making it more consistent with the examples provided by the official PaddlePaddle documentation.

:::{note}
Use separate virtual environments for VLLM and PP-DocLayoutV2 to prevent dependency conflicts.
:::

:::::{tab-set}
:sync-group: install

::::{tab-item} PaddlePaddle
:sync: paddlepaddle

The 910B4 device supports inference using the PaddlePaddle framework.

1. Pull the PaddlePaddle-compatible CANN image

```bash
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-aarch64-gcc84
```

Start the container using the following command:

```bash
docker run -it --name paddle-npu-dev -v $(pwd):/work \
    --privileged --network=host --shm-size=128G -w=/work \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" \
    ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-$(uname -m)-gcc84 /bin/bash
```

2. Install [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined) and [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)

```bash
python -m pip install paddlepaddle==3.2.0
wget https://paddle-whl.bj.bcebos.com/stable/npu/paddle-custom-npu/paddle_custom_npu-3.2.0-cp310-cp310-linux_aarch64.whl
pip  install  paddle_custom_npu-3.2.0-cp310-cp310-linux_aarch64.whl
python -m pip install -U "paddleocr[doc-parser]"
pip install safetensors
```

:::{note}
The OpenCV component may be missing:

```bash
apt-get update
apt-get install -y libgl1 libglib2.0-0
```

CANN-8.0.0 does not support some versions of NumPy and OpenCV. It is recommended to install the specified versions.

```bash
python -m pip install numpy==1.26.4
python -m pip install opencv-python==3.4.18.65
```

::::
::::{tab-item} OM inference
:sync: om

The 310P device supports only the OM model inference. For details about the process, see the guide provided in [ModelZoo](https://gitcode.com/Ascend/ModelZoo-PyTorch/tree/master/ACL_PyTorch/built-in/ocr/PP-DocLayoutV2).

::::
:::::

### Using vLLM as the backend, combined with PP-DocLayoutV2 for offline inference

```python
from paddleocr import PaddleOCRVL

doclayout_model_path = "/path/to/your/PP-DocLayoutV2/"

pipeline = PaddleOCRVL(vl_rec_backend="vllm-server", 
                       vl_rec_server_url="http://localhost:8000/v1", 
                       layout_detection_model_name="PP-DocLayoutV2",  
                       layout_detection_model_dir=doclayout_model_path,
                       device="npu")

output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png")

for i, res in enumerate(output):
    res.save_to_json(save_path=f"output_{i}.json")
    res.save_to_markdown(save_path=f"output_{i}.md")
```
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
+								# PaddleOCR-VL
 								## Introduction
 								PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition.
 								This document provides a detailed workflow for the complete deployment and verification of the model, including supported features, environment preparation, single-node deployment, and functional verification. It is designed to help users quickly complete model deployment and verification.
 								## Supported Features
-												[Doc] fix the nit in docs (#6826)

Refresh the doc, fix the nit in the docs

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2026-02-27 11:50:27 +08:00
+								Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
-												[Doc] fix the nit in docs (#6826)

Refresh the doc, fix the nit in the docs

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2026-02-27 11:50:27 +08:00
+								Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
 								## Environment Preparation
 								### Model Weight
 								* `PaddleOCR-VL-0.9B`: [PaddleOCR-VL-0.9B](https://www.modelscope.cn/models/PaddlePaddle/PaddleOCR-VL)
 								It is recommended to download the model weights to a local directory (e.g., `./PaddleOCR-VL`) for quick access during deployment.
 								### Installation
-												[main][Docs] Fix spelling errors across documentation (#6649)

Fix various spelling mistakes in the project documentation to improve
clarity and correctness.
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/d7e17aaacd5ed1b4b4be6bcfef3a1b7cbc84fc9a

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-10 11:14:57 +08:00
+								You can use our official docker image to run `PaddleOCR-VL` directly.
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
-												[Doc][Misc] Restructure tutorial documentation (#6501)

### What this PR does / why we need it?

This PR refactors the tutorial documentation by restructuring it into
three categories: Models, Features, and Hardware. This improves the
organization and navigation of the tutorials, making it easier for users
to find relevant information.

- The single `tutorials/index.md` is split into three separate index
files:
  - `docs/source/tutorials/models/index.md`
  - `docs/source/tutorials/features/index.md`
  - `docs/source/tutorials/hardwares/index.md`
- Existing tutorial markdown files have been moved into their respective
new subdirectories (`models/`, `features/`, `hardwares/`).
- The main `index.md` has been updated to link to these new tutorial
sections.

This change makes the documentation structure more logical and scalable
for future additions.

### Does this PR introduce _any_ user-facing change?

Yes, this PR changes the structure and URLs of the tutorial
documentation pages. Users following old links to tutorials will
encounter broken links. It is recommended to set up redirects if the
documentation framework supports them.

### How was this patch tested?

These are documentation-only changes. The documentation should be built
and reviewed locally to ensure all links are correct and the pages
render as expected.

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2026-02-10 15:03:35 +08:00
+								Select an image based on your machine type and start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
 								```{code-block} bash
 								   :substitutions:
 								export IMAGE=quay.io/ascend/vllm-ascend:v0.13.0rc1
 								docker run --rm \
 								    --name vllm-ascend \
 								    --shm-size=1g \
 								    --net=host \
 								    --device /dev/davinci0 \
 								    --device /dev/davinci_manager \
 								    --device /dev/devmm_svm \
 								    --device /dev/hisi_hdc \
 								    -v /usr/local/dcmi:/usr/local/dcmi \
 								    -v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \
 								    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
 								    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
 								    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
 								    -v /etc/ascend_install.info:/etc/ascend_install.info \
 								    -v /root/.cache:/root/.cache \
 								    -it $IMAGE bash
 								```
-												[Doc] add 310P3 guidance of PaddleOCR-VL (#6837)

### What this PR does / why we need it?
add 310P3 guidance of PaddleOCR-VL model, refresh PaddleOCR-VL.md in the
docs/source/tutorials/

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
by CI

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd

---------

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-02-28 16:03:07 +08:00
+								:::{note}
 								The 310P device is supported from version 0.15.0rc1. You need to select the corresponding image for installation.
 								:::
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
+								## Deployment
 								### Single-node Deployment
 								#### Single NPU (PaddleOCR-VL)
-												[Doc] add 310P3 guidance of PaddleOCR-VL (#6837)

### What this PR does / why we need it?
add 310P3 guidance of PaddleOCR-VL model, refresh PaddleOCR-VL.md in the
docs/source/tutorials/

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
by CI

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd

---------

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-02-28 16:03:07 +08:00
+								PaddleOCR-VL supports single-node single-card deployment on the 910B4 and 310P platform. Follow these steps to start the inference service:
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
 . Prepare model weights: Ensure the downloaded model weights are stored in the `PaddleOCR-VL` directory.
 . Create and execute the deployment script (save as `deploy.sh`):
-												[Doc] add 310P3 guidance of PaddleOCR-VL (#6837)

### What this PR does / why we need it?
add 310P3 guidance of PaddleOCR-VL model, refresh PaddleOCR-VL.md in the
docs/source/tutorials/

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
by CI

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd

---------

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-02-28 16:03:07 +08:00
+								:::::{tab-set}
 								:sync-group: install
 								::::{tab-item} 910B4
 								:sync: 910B4
 								Run the following script to start the vLLM server on single 910B4:
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
+								```shell
 								#!/bin/sh
 								export VLLM_USE_MODELSCOPE=true
 								export MODEL_PATH="PaddlePaddle/PaddleOCR-VL"
-												[Doc]Refresh model tutorial examples and serving commands (#7426)

### What this PR does / why we need it?
Main updates include:
- update model IDs and default model paths in serving / offline
inference examples

- adjust some command snippets and notes for better copy-paste usability

- replace `SamplingParams` argument usage from `max_completion_tokens`
to `max_tokens`（**Offline** inference currently **does not support** the
"max_completion_tokens"）
``` bash
Traceback (most recent call last):
  File "/vllm-workspace/vllm-ascend/qwen-next.py", line 18, in <module>
    sampling_params = SamplingParams(temperature=0.6, top_p=0.95, top_k=40, max_completion_tokens=32)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Unexpected keyword argument 'max_completion_tokens'
[ERROR] 2026-03-17-09:57:40 (PID:276, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception
```

- refresh **Qwen3-Omni-30B-A3B-Thinking** recommended environment
variable
``` bash
export HCCL_BUFFSIZE=512
export HCCL_OP_EXPANSION_MODE=AIV
```
``` bash
EZ9999[PID: 25038] 2026-03-17-08:21:12.001.372 (EZ9999):  HCCL_BUFFSIZE is too SMALL, maxBs = 256, h = 2048, 
epWorldSize = 2, localMoeExpertNum = 64, sharedExpertNum = 0, tokenNeedSizeDispatch = 4608, tokenNeedSizeCombine 
= 4096, k = 8, NEEDED_HCCL_BUFFSIZE(((maxBs * tokenNeedSizeDispatch * ep_worldsize * localMoeExpertNum) + 
(maxBs * tokenNeedSizeCombine * (k + sharedExpertNum))) * 2) = 305MB, HCCL_BUFFSIZE=200MB.
[FUNC:CheckWinSize][FILE:moe_distribute_dispatch_v2_tiling.cpp][LINE:984]
```

- fix **Qwen3-reranker** example usage to match the current **pooling
runner** interface and score output access
``` python
model = LLM(
    model=model_name,
    task="score",       # need fix
    hf_overrides={
        "architectures": ["Qwen3ForSequenceClassification"],
        "classifier_from_token": ["no", "yes"],
```
--->
``` python
model = LLM(
    model=model_name,
    runner="pooling",
    hf_overrides={
        "architectures": ["Qwen3ForSequenceClassification"],
        "classifier_from_token": ["no", "yes"],
```

- modify **PaddleOCR-VL**  parameter `TASK_QUEUE_ENABLE` from `2` to `1`
``` bash
(EngineCore_DP0 pid=26273) RuntimeError: NPUModelRunner init failed, error is NPUModelRunner failed, error
 is Do not support TASK_QUEUE_ENABLE = 2 during NPU graph capture, please export TASK_QUEUE_ENABLE=1/0.
```

These changes are needed because several documentation examples had
drifted from the current runtime behavior and recommended invocation
patterns, which could confuse users when following the tutorials
directly.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

- vLLM version: v0.17.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/4497431df654e46fb1fb5e64bf8611e762ae5d87

Signed-off-by: MrZ20 <2609716663@qq.com>
											
										
										
											2026-03-20 11:34:18 +08:00
+								export TASK_QUEUE_ENABLE=1
-												[Doc] add 310P3 guidance of PaddleOCR-VL (#6837)

### What this PR does / why we need it?
add 310P3 guidance of PaddleOCR-VL model, refresh PaddleOCR-VL.md in the
docs/source/tutorials/

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
by CI

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd

---------

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-02-28 16:03:07 +08:00
+								export CPU_AFFINITY_CONF=1
 								export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
 								vllm serve ${MODEL_PATH} \
 								          --max-num-batched-tokens 16384 \
 								          --served-model-name PaddleOCR-VL-0.9B \
 								          --trust-remote-code \
 								          --no-enable-prefix-caching \
 								          --mm-processor-cache-gb 0 \
-												[Doc] add 310P3 guidance of PaddleOCR-VL (#6837)

### What this PR does / why we need it?
add 310P3 guidance of PaddleOCR-VL model, refresh PaddleOCR-VL.md in the
docs/source/tutorials/

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
by CI

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd

---------

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-02-28 16:03:07 +08:00
+								          --compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}' \
 								          --additional_config '{"enable_cpu_binding":true}' \
 								          --port 8000
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
+								```
-												[Doc] add 310P3 guidance of PaddleOCR-VL (#6837)

### What this PR does / why we need it?
add 310P3 guidance of PaddleOCR-VL model, refresh PaddleOCR-VL.md in the
docs/source/tutorials/

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
by CI

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd

---------

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-02-28 16:03:07 +08:00
+								::::
 								::::{tab-item} 310P
 								:sync: 310P
 								Run the following script to start the vLLM server on single 310P:
 								```shell
 								#!/bin/sh
 								export VLLM_USE_MODELSCOPE=true
 								export MODEL_PATH="PaddlePaddle/PaddleOCR-VL"
 								vllm serve ${MODEL_PATH} \
 								          --max_model_len 16384 \
 								          --served-model-name PaddleOCR-VL-0.9B \
 								          --trust-remote-code \
 								          --no-enable-prefix-caching \
 								          --mm-processor-cache-gb 0 \
 								          --enforce-eager \
 								          --dtype float16 \
 								          --port 8000
 								```
 								:::{note}
 								The `--max_model_len` option is added to prevent errors when generating the attention operator mask on the 310P device.
 								:::
 								::::
 								:::::
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
+								#### Multiple NPU (PaddleOCR-VL)
 								Single-node deployment is recommended.
 								### Prefill-Decode Disaggregation
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								Not supported yet.
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
 								## Functional Verification
 								If your service start successfully, you can see the info shown below:
 								```bash
 								INFO:     Started server process [87471]
 								INFO:     Waiting for application startup.
 								INFO:     Application startup complete.
 								```
 								Once your server is started, you can use the OpenAI API client to make queries.
 								```python
 								from openai import OpenAI
 								client = OpenAI(
 								    api_key="EMPTY",
 								    base_url="http://localhost:8000/v1",
 								    timeout=3600
 								)
 								# Task-specific base prompts
 								TASKS = {
 								    "ocr": "OCR:",
 								    "table": "Table Recognition:",
 								    "formula": "Formula Recognition:",
 								    "chart": "Chart Recognition:",
 								}
 								messages = [
 								    {
 								        "role": "user",
 								        "content": [
 								            {
 								                "type": "image_url",
 								                "image_url": {
 								                    "url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
 								                }
 								            },
 								            {
 								                "type": "text",
 								                "text": TASKS["ocr"]
 								            }
 								        ]
 								    }
 								]
 								response = client.chat.completions.create(
 								    model="PaddleOCR-VL-0.9B",
 								    messages=messages,
 								    temperature=0.0,
 								)
 								print(f"Generated text: {response.choices[0].message.content}")
 								```
 								If you query the server successfully, you can see the info shown below (client):
 								```bash
 								Generated text: CINNAMON SUGAR
 x 17,000
 ,000
 								SUB TOTAL
 ,000
 								GRAND TOTAL
 ,000
 								CASH IDR
 ,000
 								CHANGE DUE
 ,000
 								```
 								## Offline Inference with vLLM and PP-DocLayoutV2
 								In the above example, we demonstrated how to use vLLM to infer the PaddleOCR-VL-0.9B model. Typically, we also need to integrate the PP-DocLayoutV2 model to fully unleash the capabilities of the PaddleOCR-VL model, making it more consistent with the examples provided by the official PaddlePaddle documentation.
 								:::{note}
-												[Doc] add 310P3 guidance of PaddleOCR-VL (#6837)

### What this PR does / why we need it?
add 310P3 guidance of PaddleOCR-VL model, refresh PaddleOCR-VL.md in the
docs/source/tutorials/

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
by CI

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd

---------

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-02-28 16:03:07 +08:00
+								Use separate virtual environments for VLLM and PP-DocLayoutV2 to prevent dependency conflicts.
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
+								:::
-												[Doc] add 310P3 guidance of PaddleOCR-VL (#6837)

### What this PR does / why we need it?
add 310P3 guidance of PaddleOCR-VL model, refresh PaddleOCR-VL.md in the
docs/source/tutorials/

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
by CI

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd

---------

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-02-28 16:03:07 +08:00
+								:::::{tab-set}
 								:sync-group: install
 								::::{tab-item} PaddlePaddle
 								:sync: paddlepaddle
 								The 910B4 device supports inference using the PaddlePaddle framework.
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
-												[Doc] add 310P3 guidance of PaddleOCR-VL (#6837)

### What this PR does / why we need it?
add 310P3 guidance of PaddleOCR-VL model, refresh PaddleOCR-VL.md in the
docs/source/tutorials/

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
by CI

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd

---------

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-02-28 16:03:07 +08:00
+. Pull the PaddlePaddle-compatible CANN image
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
 								```bash
 								docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-aarch64-gcc84
 								```
 								Start the container using the following command:
 								```bash
 								docker run -it --name paddle-npu-dev -v $(pwd):/work \
 								    --privileged --network=host --shm-size=128G -w=/work \
 								    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
 								    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
 								    -v /usr/local/dcmi:/usr/local/dcmi \
 								    -e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" \
 								    ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-$(uname -m)-gcc84 /bin/bash
 								```
-												[Doc] add 310P3 guidance of PaddleOCR-VL (#6837)

### What this PR does / why we need it?
add 310P3 guidance of PaddleOCR-VL model, refresh PaddleOCR-VL.md in the
docs/source/tutorials/

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
by CI

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd

---------

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-02-28 16:03:07 +08:00
+. Install [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined) and [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
 								```bash
 								python -m pip install paddlepaddle==3.2.0
 								wget https://paddle-whl.bj.bcebos.com/stable/npu/paddle-custom-npu/paddle_custom_npu-3.2.0-cp310-cp310-linux_aarch64.whl
 								pip  install  paddle_custom_npu-3.2.0-cp310-cp310-linux_aarch64.whl
 								python -m pip install -U "paddleocr[doc-parser]"
 								pip install safetensors
 								```
 								:::{note}
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								The OpenCV component may be missing:
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
 								```bash
 								apt-get update
 								apt-get install -y libgl1 libglib2.0-0
 								```
 								CANN-8.0.0 does not support some versions of NumPy and OpenCV. It is recommended to install the specified versions.
 								```bash
 								python -m pip install numpy==1.26.4
 								python -m pip install opencv-python==3.4.18.65
 								```
-												[Doc] add 310P3 guidance of PaddleOCR-VL (#6837)

### What this PR does / why we need it?
add 310P3 guidance of PaddleOCR-VL model, refresh PaddleOCR-VL.md in the
docs/source/tutorials/

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
by CI

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd

---------

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-02-28 16:03:07 +08:00
+								::::
 								::::{tab-item} OM inference
 								:sync: om
 								The 310P device supports only the OM model inference. For details about the process, see the guide provided in [ModelZoo](https://gitcode.com/Ascend/ModelZoo-PyTorch/tree/master/ACL_PyTorch/built-in/ocr/PP-DocLayoutV2).
 								::::
 								:::::
-												[Doc] add PaddleOCR-VL tutorials guide (#5556)

### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
											
										
										
											2026-01-09 11:01:25 +08:00
 								### Using vLLM as the backend, combined with PP-DocLayoutV2 for offline inference
 								```python
 								from paddleocr import PaddleOCRVL
 								doclayout_model_path = "/path/to/your/PP-DocLayoutV2/"
 								pipeline = PaddleOCRVL(vl_rec_backend="vllm-server",
 								                       vl_rec_server_url="http://localhost:8000/v1",
 								                       layout_detection_model_name="PP-DocLayoutV2",
 								                       layout_detection_model_dir=doclayout_model_path,
 								                       device="npu")
 								output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png")
 								for i, res in enumerate(output):
 								    res.save_to_json(save_path=f"output_{i}.json")
 								    res.save_to_markdown(save_path=f"output_{i}.md")
 								```