[Doc] add PaddleOCR-VL tutorials guide (#5556)
### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in `docs/source/tutorials/index.md`
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI
- vLLM version: v0.13.0
- vLLM main:
7157596103
Signed-off-by: zouyizhou <zouyizhou@huawei.com>
This commit is contained in:
227
docs/source/tutorials/PaddleOCR-VL.md
Normal file
227
docs/source/tutorials/PaddleOCR-VL.md
Normal file
@@ -0,0 +1,227 @@
|
||||
# PaddleOCR-VL
|
||||
|
||||
## Introduction
|
||||
|
||||
PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition.
|
||||
|
||||
This document provides a detailed workflow for the complete deployment and verification of the model, including supported features, environment preparation, single-node deployment, and functional verification. It is designed to help users quickly complete model deployment and verification.
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_models.html) to get the model's supported feature matrix.
|
||||
|
||||
Refer to [feature guide](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/feature_guide/index.html) to get the feature's configuration.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
### Model Weight
|
||||
|
||||
* `PaddleOCR-VL-0.9B`: [PaddleOCR-VL-0.9B](https://www.modelscope.cn/models/PaddlePaddle/PaddleOCR-VL)
|
||||
|
||||
It is recommended to download the model weights to a local directory (e.g., `./PaddleOCR-VL`) for quick access during deployment.
|
||||
|
||||
### Installation
|
||||
|
||||
You can using our official docker image to run `PaddleOCR-VL` directly.
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
|
||||
```{code-block} bash
|
||||
:substitutions:
|
||||
export IMAGE=quay.io/ascend/vllm-ascend:v0.13.0rc1
|
||||
docker run --rm \
|
||||
--name vllm-ascend \
|
||||
--shm-size=1g \
|
||||
--net=host \
|
||||
--device /dev/davinci0 \
|
||||
--device /dev/davinci_manager \
|
||||
--device /dev/devmm_svm \
|
||||
--device /dev/hisi_hdc \
|
||||
-v /usr/local/dcmi:/usr/local/dcmi \
|
||||
-v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \
|
||||
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
|
||||
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
|
||||
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
|
||||
-v /etc/ascend_install.info:/etc/ascend_install.info \
|
||||
-v /root/.cache:/root/.cache \
|
||||
-it $IMAGE bash
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
### Single-node Deployment
|
||||
|
||||
#### Single NPU (PaddleOCR-VL)
|
||||
|
||||
PaddleOCR-VL supports single-node single-card deployment on the 910B4 platform. Follow these steps to start the inference service:
|
||||
|
||||
1. Prepare model weights: Ensure the downloaded model weights are stored in the `PaddleOCR-VL` directory.
|
||||
2. Create and execute the deployment script (save as `deploy.sh`):
|
||||
|
||||
```shell
|
||||
#!/bin/sh
|
||||
export VLLM_USE_MODELSCOPE=true
|
||||
export MODEL_PATH="PaddlePaddle/PaddleOCR-VL"
|
||||
|
||||
vllm serve ${MODEL_PATH} \
|
||||
--max-num-batched-tokens 16384 \
|
||||
--served-model-name PaddleOCR-VL-0.9B \
|
||||
--trust-remote-code \
|
||||
--no-enable-prefix-caching \
|
||||
--mm-processor-cache-gb 0 \
|
||||
--compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'
|
||||
```
|
||||
|
||||
#### Multiple NPU (PaddleOCR-VL)
|
||||
|
||||
Single-node deployment is recommended.
|
||||
|
||||
### Prefill-Decode Disaggregation
|
||||
|
||||
Not supported yet
|
||||
|
||||
## Functional Verification
|
||||
|
||||
If your service start successfully, you can see the info shown below:
|
||||
|
||||
```bash
|
||||
INFO: Started server process [87471]
|
||||
INFO: Waiting for application startup.
|
||||
INFO: Application startup complete.
|
||||
```
|
||||
|
||||
Once your server is started, you can use the OpenAI API client to make queries.
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
api_key="EMPTY",
|
||||
base_url="http://localhost:8000/v1",
|
||||
timeout=3600
|
||||
)
|
||||
|
||||
# Task-specific base prompts
|
||||
TASKS = {
|
||||
"ocr": "OCR:",
|
||||
"table": "Table Recognition:",
|
||||
"formula": "Formula Recognition:",
|
||||
"chart": "Chart Recognition:",
|
||||
}
|
||||
|
||||
messages = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "text",
|
||||
"text": TASKS["ocr"]
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="PaddleOCR-VL-0.9B",
|
||||
messages=messages,
|
||||
temperature=0.0,
|
||||
)
|
||||
print(f"Generated text: {response.choices[0].message.content}")
|
||||
```
|
||||
|
||||
If you query the server successfully, you can see the info shown below (client):
|
||||
|
||||
```bash
|
||||
Generated text: CINNAMON SUGAR
|
||||
1 x 17,000
|
||||
17,000
|
||||
SUB TOTAL
|
||||
17,000
|
||||
GRAND TOTAL
|
||||
17,000
|
||||
CASH IDR
|
||||
20,000
|
||||
CHANGE DUE
|
||||
3,000
|
||||
```
|
||||
|
||||
## Offline Inference with vLLM and PP-DocLayoutV2
|
||||
|
||||
In the above example, we demonstrated how to use vLLM to infer the PaddleOCR-VL-0.9B model. Typically, we also need to integrate the PP-DocLayoutV2 model to fully unleash the capabilities of the PaddleOCR-VL model, making it more consistent with the examples provided by the official PaddlePaddle documentation.
|
||||
|
||||
:::{note}
|
||||
Use separate virtual environments for VLLM and PPdoclayoutV2 to prevent dependency conflicts.
|
||||
:::
|
||||
|
||||
### Pull the PaddlePaddle-compatible CANN image
|
||||
|
||||
Obtaining Ascend Images from PaddlePaddle:
|
||||
|
||||
```bash
|
||||
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-aarch64-gcc84
|
||||
```
|
||||
|
||||
Start the container using the following command:
|
||||
|
||||
```bash
|
||||
docker run -it --name paddle-npu-dev -v $(pwd):/work \
|
||||
--privileged --network=host --shm-size=128G -w=/work \
|
||||
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
|
||||
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
|
||||
-v /usr/local/dcmi:/usr/local/dcmi \
|
||||
-e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" \
|
||||
ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-$(uname -m)-gcc84 /bin/bash
|
||||
```
|
||||
|
||||
### Install [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined) and [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
|
||||
|
||||
```bash
|
||||
python -m pip install paddlepaddle==3.2.0
|
||||
wget https://paddle-whl.bj.bcebos.com/stable/npu/paddle-custom-npu/paddle_custom_npu-3.2.0-cp310-cp310-linux_aarch64.whl
|
||||
pip install paddle_custom_npu-3.2.0-cp310-cp310-linux_aarch64.whl
|
||||
python -m pip install -U "paddleocr[doc-parser]"
|
||||
pip install safetensors
|
||||
```
|
||||
|
||||
:::{note}
|
||||
The OpenCV component may be missing:
|
||||
|
||||
```bash
|
||||
apt-get update
|
||||
apt-get install -y libgl1 libglib2.0-0
|
||||
```
|
||||
|
||||
CANN-8.0.0 does not support some versions of NumPy and OpenCV. It is recommended to install the specified versions.
|
||||
|
||||
```bash
|
||||
python -m pip install numpy==1.26.4
|
||||
python -m pip install opencv-python==3.4.18.65
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
### Using vLLM as the backend, combined with PP-DocLayoutV2 for offline inference
|
||||
|
||||
```python
|
||||
from paddleocr import PaddleOCRVL
|
||||
|
||||
doclayout_model_path = "/path/to/your/PP-DocLayoutV2/"
|
||||
|
||||
pipeline = PaddleOCRVL(vl_rec_backend="vllm-server",
|
||||
vl_rec_server_url="http://localhost:8000/v1",
|
||||
layout_detection_model_name="PP-DocLayoutV2",
|
||||
layout_detection_model_dir=doclayout_model_path,
|
||||
device="npu")
|
||||
|
||||
output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png")
|
||||
|
||||
for i, res in enumerate(output):
|
||||
res.save_to_json(save_path=f"output_{i}.json")
|
||||
res.save_to_markdown(save_path=f"output_{i}.md")
|
||||
```
|
||||
@@ -21,6 +21,7 @@ DeepSeek-V3.1.md
|
||||
DeepSeek-V3.2.md
|
||||
DeepSeek-R1.md
|
||||
Kimi-K2-Thinking
|
||||
PaddleOCR-VL
|
||||
pd_colocated_mooncake_multi_instance
|
||||
pd_disaggregation_mooncake_single_node
|
||||
pd_disaggregation_mooncake_multi_node
|
||||
|
||||
@@ -76,6 +76,7 @@ Get the latest info here: https://github.com/vllm-project/vllm-ascend/issues/160
|
||||
| Phi-3-Vision/Phi-3.5-Vision | ✅ | || A2/A3 |||||||||||||||||
|
||||
| Gemma3 | ✅ | || A2/A3 |||||||||||||||||
|
||||
| Llama3.2 | ✅ | || A2/A3 |||||||||||||||||
|
||||
| PaddleOCR-VL | ✅ | || A2/A3 |||||||||||||||||
|
||||
| Llama4 | ❌ | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972) |||||||||||||||||||
|
||||
| Keye-VL-8B-Preview | ❌ | [1963](https://github.com/vllm-project/vllm-ascend/issues/1963) |||||||||||||||||||
|
||||
| Florence-2 | ❌ | [2259](https://github.com/vllm-project/vllm-ascend/issues/2259) |||||||||||||||||||
|
||||
|
||||
Reference in New Issue
Block a user