[Doc] add 310P3 guidance of PaddleOCR-VL (#6837)

### What this PR does / why we need it? add 310P3 guidance of PaddleOCR-VL model, refresh PaddleOCR-VL.md in the docs/source/tutorials/ ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? by CI - vLLM version: v0.15.0 - vLLM main: 83b47f67b1 --------- Signed-off-by: zouyizhou <zouyizhou@huawei.com>
2026-02-28 16:03:07 +08:00
parent 3cc8bf15da
commit 81fb7d5779
1 changed files with 66 additions and 7 deletions
--- a/docs/source/tutorials/models/PaddleOCR-VL.md
+++ b/docs/source/tutorials/models/PaddleOCR-VL.md
@@ -47,21 +47,36 @@ docker run --rm \
    -it $IMAGE bash
 ```

+:::{note}
+The 310P device is supported from version 0.15.0rc1. You need to select the corresponding image for installation.
+:::
+
 ## Deployment

 ### Single-node Deployment

 #### Single NPU (PaddleOCR-VL)

-PaddleOCR-VL supports single-node single-card deployment on the 910B4 platform. Follow these steps to start the inference service:
+PaddleOCR-VL supports single-node single-card deployment on the 910B4 and 310P platform. Follow these steps to start the inference service:

 1. Prepare model weights: Ensure the downloaded model weights are stored in the `PaddleOCR-VL` directory.
 2. Create and execute the deployment script (save as `deploy.sh`):

+:::::{tab-set}
+:sync-group: install
+
+::::{tab-item} 910B4
+:sync: 910B4
+
+Run the following script to start the vLLM server on single 910B4:
+
 ```shell
 #!/bin/sh
 export VLLM_USE_MODELSCOPE=true
 export MODEL_PATH="PaddlePaddle/PaddleOCR-VL"
+export TASK_QUEUE_ENABLE=2
+export CPU_AFFINITY_CONF=1
+export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"

 vllm serve ${MODEL_PATH} \
          --max-num-batched-tokens 16384 \
@@ -69,9 +84,40 @@ vllm serve ${MODEL_PATH} \
          --trust-remote-code \
          --no-enable-prefix-caching \
          --mm-processor-cache-gb 0 \
-          --compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'
+          --compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}' \
+          --additional_config '{"enable_cpu_binding":true}' \
+          --port 8000
 ```

+::::
+::::{tab-item} 310P
+:sync: 310P
+
+Run the following script to start the vLLM server on single 310P:
+
+```shell
+#!/bin/sh
+export VLLM_USE_MODELSCOPE=true
+export MODEL_PATH="PaddlePaddle/PaddleOCR-VL"
+
+vllm serve ${MODEL_PATH} \
+          --max_model_len 16384 \
+          --served-model-name PaddleOCR-VL-0.9B \
+          --trust-remote-code \
+          --no-enable-prefix-caching \
+          --mm-processor-cache-gb 0 \
+          --enforce-eager \
+          --dtype float16 \
+          --port 8000
+```
+
+:::{note}
+The `--max_model_len` option is added to prevent errors when generating the attention operator mask on the 310P device.
+:::
+
+::::
+:::::
+
 #### Multiple NPU (PaddleOCR-VL)

 Single-node deployment is recommended.
@@ -156,12 +202,18 @@ CHANGE DUE
 In the above example, we demonstrated how to use vLLM to infer the PaddleOCR-VL-0.9B model. Typically, we also need to integrate the PP-DocLayoutV2 model to fully unleash the capabilities of the PaddleOCR-VL model, making it more consistent with the examples provided by the official PaddlePaddle documentation.

 :::{note}
-Use separate virtual environments for VLLM and PPdoclayoutV2 to prevent dependency conflicts.
+Use separate virtual environments for VLLM and PP-DocLayoutV2 to prevent dependency conflicts.
 :::

-### Pull the PaddlePaddle-compatible CANN image
+:::::{tab-set}
+:sync-group: install

-Obtaining Ascend Images from PaddlePaddle:
+::::{tab-item} PaddlePaddle
+:sync: paddlepaddle
+
+The 910B4 device supports inference using the PaddlePaddle framework.
+
+1. Pull the PaddlePaddle-compatible CANN image

 ```bash
 docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-aarch64-gcc84
@@ -179,7 +231,7 @@ docker run -it --name paddle-npu-dev -v $(pwd):/work \
    ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-$(uname -m)-gcc84 /bin/bash
 ```

-### Install [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined) and [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
+2. Install [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined) and [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)

 ```bash
 python -m pip install paddlepaddle==3.2.0
@@ -204,7 +256,14 @@ python -m pip install numpy==1.26.4
 python -m pip install opencv-python==3.4.18.65
 ```

-:::
+::::
+::::{tab-item} OM inference
+:sync: om
+
+The 310P device supports only the OM model inference. For details about the process, see the guide provided in [ModelZoo](https://gitcode.com/Ascend/ModelZoo-PyTorch/tree/master/ACL_PyTorch/built-in/ocr/PP-DocLayoutV2).
+
+::::
+:::::

 ### Using vLLM as the backend, combined with PP-DocLayoutV2 for offline inference