[CI] Add e2e test frame work and doctest (#730)

### What this PR does / why we need it?
Add quickstart doctest CI

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
- CI passed
- Run `/vllm-ascend/tests/e2e/run_doctests.sh`
Related: https://github.com/vllm-project/vllm-ascend/issues/725

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
This commit is contained in:
Yikun Jiang
2025-05-14 09:27:54 +08:00
committed by GitHub
parent 857f489cbf
commit 59e02502b1
5 changed files with 243 additions and 3 deletions

View File

@@ -68,6 +68,7 @@ The default workdir is `/workspace`, vLLM and vLLM Ascend code are placed in `/v
You can use Modelscope mirror to speed up download:
<!-- tests/e2e/doctest/001-quickstart-test.sh should be considered updating as well -->
```bash
export VLLM_USE_MODELSCOPE=true
```
@@ -81,6 +82,7 @@ With vLLM installed, you can start generating texts for list of input prompts (i
Try to run below Python script directly or use `python3` shell to generate texts:
<!-- tests/e2e/doctest/001-quickstart-test.sh should be considered updating as well -->
```python
from vllm import LLM, SamplingParams
@@ -108,6 +110,7 @@ vLLM can also be deployed as a server that implements the OpenAI API protocol. R
the following command to start the vLLM server with the
[Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) model:
<!-- tests/e2e/doctest/001-quickstart-test.sh should be considered updating as well -->
```bash
# Deploy vLLM server (The first run will take about 3-5 mins (10 MB/s) to download models)
vllm serve Qwen/Qwen2.5-0.5B-Instruct &
@@ -125,12 +128,14 @@ Congratulations, you have successfully started the vLLM server!
You can query the list the models:
<!-- tests/e2e/doctest/001-quickstart-test.sh should be considered updating as well -->
```bash
curl http://localhost:8000/v1/models | python3 -m json.tool
```
You can also query the model with input prompts:
<!-- tests/e2e/doctest/001-quickstart-test.sh should be considered updating as well -->
```bash
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
@@ -145,10 +150,10 @@ curl http://localhost:8000/v1/completions \
vLLM is serving as background process, you can use `kill -2 $VLLM_PID` to stop the background process gracefully,
it's equal to `Ctrl-C` to stop foreground vLLM process:
<!-- tests/e2e/doctest/001-quickstart-test.sh should be considered updating as well -->
```bash
ps -ef | grep "/.venv/bin/vllm serve" | grep -v grep
VLLM_PID=`ps -ef | grep "/.venv/bin/vllm serve" | grep -v grep | awk '{print $2}'`
kill -2 $VLLM_PID
VLLM_PID=$(pgrep -f "vllm serve")
kill -2 "$VLLM_PID"
```
You will see output as below: