[Doc] add 310P3 guidance of PaddleOCR-VL (#6837)
### What this PR does / why we need it?
add 310P3 guidance of PaddleOCR-VL model, refresh PaddleOCR-VL.md in the
docs/source/tutorials/
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
by CI
- vLLM version: v0.15.0
- vLLM main:
83b47f67b1
---------
Signed-off-by: zouyizhou <zouyizhou@huawei.com>
This commit is contained in:
@@ -47,21 +47,36 @@ docker run --rm \
|
|||||||
-it $IMAGE bash
|
-it $IMAGE bash
|
||||||
```
|
```
|
||||||
|
|
||||||
|
:::{note}
|
||||||
|
The 310P device is supported from version 0.15.0rc1. You need to select the corresponding image for installation.
|
||||||
|
:::
|
||||||
|
|
||||||
## Deployment
|
## Deployment
|
||||||
|
|
||||||
### Single-node Deployment
|
### Single-node Deployment
|
||||||
|
|
||||||
#### Single NPU (PaddleOCR-VL)
|
#### Single NPU (PaddleOCR-VL)
|
||||||
|
|
||||||
PaddleOCR-VL supports single-node single-card deployment on the 910B4 platform. Follow these steps to start the inference service:
|
PaddleOCR-VL supports single-node single-card deployment on the 910B4 and 310P platform. Follow these steps to start the inference service:
|
||||||
|
|
||||||
1. Prepare model weights: Ensure the downloaded model weights are stored in the `PaddleOCR-VL` directory.
|
1. Prepare model weights: Ensure the downloaded model weights are stored in the `PaddleOCR-VL` directory.
|
||||||
2. Create and execute the deployment script (save as `deploy.sh`):
|
2. Create and execute the deployment script (save as `deploy.sh`):
|
||||||
|
|
||||||
|
:::::{tab-set}
|
||||||
|
:sync-group: install
|
||||||
|
|
||||||
|
::::{tab-item} 910B4
|
||||||
|
:sync: 910B4
|
||||||
|
|
||||||
|
Run the following script to start the vLLM server on single 910B4:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
#!/bin/sh
|
#!/bin/sh
|
||||||
export VLLM_USE_MODELSCOPE=true
|
export VLLM_USE_MODELSCOPE=true
|
||||||
export MODEL_PATH="PaddlePaddle/PaddleOCR-VL"
|
export MODEL_PATH="PaddlePaddle/PaddleOCR-VL"
|
||||||
|
export TASK_QUEUE_ENABLE=2
|
||||||
|
export CPU_AFFINITY_CONF=1
|
||||||
|
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
|
||||||
|
|
||||||
vllm serve ${MODEL_PATH} \
|
vllm serve ${MODEL_PATH} \
|
||||||
--max-num-batched-tokens 16384 \
|
--max-num-batched-tokens 16384 \
|
||||||
@@ -69,9 +84,40 @@ vllm serve ${MODEL_PATH} \
|
|||||||
--trust-remote-code \
|
--trust-remote-code \
|
||||||
--no-enable-prefix-caching \
|
--no-enable-prefix-caching \
|
||||||
--mm-processor-cache-gb 0 \
|
--mm-processor-cache-gb 0 \
|
||||||
--compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'
|
--compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}' \
|
||||||
|
--additional_config '{"enable_cpu_binding":true}' \
|
||||||
|
--port 8000
|
||||||
```
|
```
|
||||||
|
|
||||||
|
::::
|
||||||
|
::::{tab-item} 310P
|
||||||
|
:sync: 310P
|
||||||
|
|
||||||
|
Run the following script to start the vLLM server on single 310P:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
#!/bin/sh
|
||||||
|
export VLLM_USE_MODELSCOPE=true
|
||||||
|
export MODEL_PATH="PaddlePaddle/PaddleOCR-VL"
|
||||||
|
|
||||||
|
vllm serve ${MODEL_PATH} \
|
||||||
|
--max_model_len 16384 \
|
||||||
|
--served-model-name PaddleOCR-VL-0.9B \
|
||||||
|
--trust-remote-code \
|
||||||
|
--no-enable-prefix-caching \
|
||||||
|
--mm-processor-cache-gb 0 \
|
||||||
|
--enforce-eager \
|
||||||
|
--dtype float16 \
|
||||||
|
--port 8000
|
||||||
|
```
|
||||||
|
|
||||||
|
:::{note}
|
||||||
|
The `--max_model_len` option is added to prevent errors when generating the attention operator mask on the 310P device.
|
||||||
|
:::
|
||||||
|
|
||||||
|
::::
|
||||||
|
:::::
|
||||||
|
|
||||||
#### Multiple NPU (PaddleOCR-VL)
|
#### Multiple NPU (PaddleOCR-VL)
|
||||||
|
|
||||||
Single-node deployment is recommended.
|
Single-node deployment is recommended.
|
||||||
@@ -156,12 +202,18 @@ CHANGE DUE
|
|||||||
In the above example, we demonstrated how to use vLLM to infer the PaddleOCR-VL-0.9B model. Typically, we also need to integrate the PP-DocLayoutV2 model to fully unleash the capabilities of the PaddleOCR-VL model, making it more consistent with the examples provided by the official PaddlePaddle documentation.
|
In the above example, we demonstrated how to use vLLM to infer the PaddleOCR-VL-0.9B model. Typically, we also need to integrate the PP-DocLayoutV2 model to fully unleash the capabilities of the PaddleOCR-VL model, making it more consistent with the examples provided by the official PaddlePaddle documentation.
|
||||||
|
|
||||||
:::{note}
|
:::{note}
|
||||||
Use separate virtual environments for VLLM and PPdoclayoutV2 to prevent dependency conflicts.
|
Use separate virtual environments for VLLM and PP-DocLayoutV2 to prevent dependency conflicts.
|
||||||
:::
|
:::
|
||||||
|
|
||||||
### Pull the PaddlePaddle-compatible CANN image
|
:::::{tab-set}
|
||||||
|
:sync-group: install
|
||||||
|
|
||||||
Obtaining Ascend Images from PaddlePaddle:
|
::::{tab-item} PaddlePaddle
|
||||||
|
:sync: paddlepaddle
|
||||||
|
|
||||||
|
The 910B4 device supports inference using the PaddlePaddle framework.
|
||||||
|
|
||||||
|
1. Pull the PaddlePaddle-compatible CANN image
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-aarch64-gcc84
|
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-aarch64-gcc84
|
||||||
@@ -179,7 +231,7 @@ docker run -it --name paddle-npu-dev -v $(pwd):/work \
|
|||||||
ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-$(uname -m)-gcc84 /bin/bash
|
ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-$(uname -m)-gcc84 /bin/bash
|
||||||
```
|
```
|
||||||
|
|
||||||
### Install [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined) and [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
|
2. Install [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined) and [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python -m pip install paddlepaddle==3.2.0
|
python -m pip install paddlepaddle==3.2.0
|
||||||
@@ -204,7 +256,14 @@ python -m pip install numpy==1.26.4
|
|||||||
python -m pip install opencv-python==3.4.18.65
|
python -m pip install opencv-python==3.4.18.65
|
||||||
```
|
```
|
||||||
|
|
||||||
:::
|
::::
|
||||||
|
::::{tab-item} OM inference
|
||||||
|
:sync: om
|
||||||
|
|
||||||
|
The 310P device supports only the OM model inference. For details about the process, see the guide provided in [ModelZoo](https://gitcode.com/Ascend/ModelZoo-PyTorch/tree/master/ACL_PyTorch/built-in/ocr/PP-DocLayoutV2).
|
||||||
|
|
||||||
|
::::
|
||||||
|
:::::
|
||||||
|
|
||||||
### Using vLLM as the backend, combined with PP-DocLayoutV2 for offline inference
|
### Using vLLM as the backend, combined with PP-DocLayoutV2 for offline inference
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user