From 98c788a65ae7bdc982b8f5088bcefc4f4c716945 Mon Sep 17 00:00:00 2001 From: zyz111222 <2542678383@qq.com> Date: Fri, 9 Jan 2026 11:01:25 +0800 Subject: [PATCH] [Doc] add PaddleOCR-VL tutorials guide (#5556) ### What this PR does / why we need it? 1. add PaddleOCR-VL.md in the `docs/source/tutorials/` 2. add PaddleOCR-VL index in `docs/source/tutorials/index.md` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by CI - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731 Signed-off-by: zouyizhou --- docs/source/tutorials/PaddleOCR-VL.md | 227 ++++++++++++++++++ docs/source/tutorials/index.md | 1 + .../support_matrix/supported_models.md | 1 + 3 files changed, 229 insertions(+) create mode 100644 docs/source/tutorials/PaddleOCR-VL.md diff --git a/docs/source/tutorials/PaddleOCR-VL.md b/docs/source/tutorials/PaddleOCR-VL.md new file mode 100644 index 00000000..8891b161 --- /dev/null +++ b/docs/source/tutorials/PaddleOCR-VL.md @@ -0,0 +1,227 @@ +# PaddleOCR-VL + +## Introduction + +PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. + +This document provides a detailed workflow for the complete deployment and verification of the model, including supported features, environment preparation, single-node deployment, and functional verification. It is designed to help users quickly complete model deployment and verification. + +## Supported Features + +Refer to [supported features](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_models.html) to get the model's supported feature matrix. + +Refer to [feature guide](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/feature_guide/index.html) to get the feature's configuration. + +## Environment Preparation + +### Model Weight + +* `PaddleOCR-VL-0.9B`: [PaddleOCR-VL-0.9B](https://www.modelscope.cn/models/PaddlePaddle/PaddleOCR-VL) + +It is recommended to download the model weights to a local directory (e.g., `./PaddleOCR-VL`) for quick access during deployment. + +### Installation + +You can using our official docker image to run `PaddleOCR-VL` directly. + +Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker). + +```{code-block} bash + :substitutions: +export IMAGE=quay.io/ascend/vllm-ascend:v0.13.0rc1 +docker run --rm \ + --name vllm-ascend \ + --shm-size=1g \ + --net=host \ + --device /dev/davinci0 \ + --device /dev/davinci_manager \ + --device /dev/devmm_svm \ + --device /dev/hisi_hdc \ + -v /usr/local/dcmi:/usr/local/dcmi \ + -v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \ + -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ + -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \ + -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \ + -v /etc/ascend_install.info:/etc/ascend_install.info \ + -v /root/.cache:/root/.cache \ + -it $IMAGE bash +``` + +## Deployment + +### Single-node Deployment + +#### Single NPU (PaddleOCR-VL) + +PaddleOCR-VL supports single-node single-card deployment on the 910B4 platform. Follow these steps to start the inference service: + +1. Prepare model weights: Ensure the downloaded model weights are stored in the `PaddleOCR-VL` directory. +2. Create and execute the deployment script (save as `deploy.sh`): + +```shell +#!/bin/sh +export VLLM_USE_MODELSCOPE=true +export MODEL_PATH="PaddlePaddle/PaddleOCR-VL" + +vllm serve ${MODEL_PATH} \ + --max-num-batched-tokens 16384 \ + --served-model-name PaddleOCR-VL-0.9B \ + --trust-remote-code \ + --no-enable-prefix-caching \ + --mm-processor-cache-gb 0 \ + --compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}' +``` + +#### Multiple NPU (PaddleOCR-VL) + +Single-node deployment is recommended. + +### Prefill-Decode Disaggregation + +Not supported yet + +## Functional Verification + +If your service start successfully, you can see the info shown below: + +```bash +INFO: Started server process [87471] +INFO: Waiting for application startup. +INFO: Application startup complete. +``` + +Once your server is started, you can use the OpenAI API client to make queries. + +```python +from openai import OpenAI + +client = OpenAI( + api_key="EMPTY", + base_url="http://localhost:8000/v1", + timeout=3600 +) + +# Task-specific base prompts +TASKS = { + "ocr": "OCR:", + "table": "Table Recognition:", + "formula": "Formula Recognition:", + "chart": "Chart Recognition:", +} + +messages = [ + { + "role": "user", + "content": [ + { + "type": "image_url", + "image_url": { + "url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png" + } + }, + { + "type": "text", + "text": TASKS["ocr"] + } + ] + } +] + +response = client.chat.completions.create( + model="PaddleOCR-VL-0.9B", + messages=messages, + temperature=0.0, +) +print(f"Generated text: {response.choices[0].message.content}") +``` + +If you query the server successfully, you can see the info shown below (client): + +```bash +Generated text: CINNAMON SUGAR +1 x 17,000 +17,000 +SUB TOTAL +17,000 +GRAND TOTAL +17,000 +CASH IDR +20,000 +CHANGE DUE +3,000 +``` + +## Offline Inference with vLLM and PP-DocLayoutV2 + +In the above example, we demonstrated how to use vLLM to infer the PaddleOCR-VL-0.9B model. Typically, we also need to integrate the PP-DocLayoutV2 model to fully unleash the capabilities of the PaddleOCR-VL model, making it more consistent with the examples provided by the official PaddlePaddle documentation. + +:::{note} +Use separate virtual environments for VLLM and PPdoclayoutV2 to prevent dependency conflicts. +::: + +### Pull the PaddlePaddle-compatible CANN image + +Obtaining Ascend Images from PaddlePaddle: + +```bash +docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-aarch64-gcc84 +``` + +Start the container using the following command: + +```bash +docker run -it --name paddle-npu-dev -v $(pwd):/work \ + --privileged --network=host --shm-size=128G -w=/work \ + -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ + -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ + -v /usr/local/dcmi:/usr/local/dcmi \ + -e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-$(uname -m)-gcc84 /bin/bash +``` + +### Install [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined) and [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) + +```bash +python -m pip install paddlepaddle==3.2.0 +wget https://paddle-whl.bj.bcebos.com/stable/npu/paddle-custom-npu/paddle_custom_npu-3.2.0-cp310-cp310-linux_aarch64.whl +pip install paddle_custom_npu-3.2.0-cp310-cp310-linux_aarch64.whl +python -m pip install -U "paddleocr[doc-parser]" +pip install safetensors +``` + +:::{note} +The OpenCV component may be missing: + +```bash +apt-get update +apt-get install -y libgl1 libglib2.0-0 +``` + +CANN-8.0.0 does not support some versions of NumPy and OpenCV. It is recommended to install the specified versions. + +```bash +python -m pip install numpy==1.26.4 +python -m pip install opencv-python==3.4.18.65 +``` + +::: + +### Using vLLM as the backend, combined with PP-DocLayoutV2 for offline inference + +```python +from paddleocr import PaddleOCRVL + +doclayout_model_path = "/path/to/your/PP-DocLayoutV2/" + +pipeline = PaddleOCRVL(vl_rec_backend="vllm-server", + vl_rec_server_url="http://localhost:8000/v1", + layout_detection_model_name="PP-DocLayoutV2", + layout_detection_model_dir=doclayout_model_path, + device="npu") + +output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png") + +for i, res in enumerate(output): + res.save_to_json(save_path=f"output_{i}.json") + res.save_to_markdown(save_path=f"output_{i}.md") +``` diff --git a/docs/source/tutorials/index.md b/docs/source/tutorials/index.md index 5b6d63f3..de54e2aa 100644 --- a/docs/source/tutorials/index.md +++ b/docs/source/tutorials/index.md @@ -21,6 +21,7 @@ DeepSeek-V3.1.md DeepSeek-V3.2.md DeepSeek-R1.md Kimi-K2-Thinking +PaddleOCR-VL pd_colocated_mooncake_multi_instance pd_disaggregation_mooncake_single_node pd_disaggregation_mooncake_multi_node diff --git a/docs/source/user_guide/support_matrix/supported_models.md b/docs/source/user_guide/support_matrix/supported_models.md index 3821e113..c229e2d8 100644 --- a/docs/source/user_guide/support_matrix/supported_models.md +++ b/docs/source/user_guide/support_matrix/supported_models.md @@ -76,6 +76,7 @@ Get the latest info here: https://github.com/vllm-project/vllm-ascend/issues/160 | Phi-3-Vision/Phi-3.5-Vision | ✅ | || A2/A3 ||||||||||||||||| | Gemma3 | ✅ | || A2/A3 ||||||||||||||||| | Llama3.2 | ✅ | || A2/A3 ||||||||||||||||| +| PaddleOCR-VL | ✅ | || A2/A3 ||||||||||||||||| | Llama4 | ❌ | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972) ||||||||||||||||||| | Keye-VL-8B-Preview | ❌ | [1963](https://github.com/vllm-project/vllm-ascend/issues/1963) ||||||||||||||||||| | Florence-2 | ❌ | [2259](https://github.com/vllm-project/vllm-ascend/issues/2259) |||||||||||||||||||