[1/2/N] Enable pymarkdown and python __init__ for lint system (#2011)
### What this PR does / why we need it?
1. Enable pymarkdown check
2. Enable python `__init__.py` check for vllm and vllm-ascend
3. Make clean code
### How was this patch tested?
- vLLM version: v0.9.2
- vLLM main:
29c6fbe58c
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
This commit is contained in:
@@ -3,4 +3,4 @@
|
||||
:::{toctree}
|
||||
:caption: Accuracy Report
|
||||
:maxdepth: 1
|
||||
:::
|
||||
:::
|
||||
|
||||
@@ -65,6 +65,7 @@ pip install gradio plotly evalscope
|
||||
## 3. Run gsm8k accuracy test using EvalScope
|
||||
|
||||
You can `evalscope eval` run gsm8k accuracy test:
|
||||
|
||||
```
|
||||
evalscope eval \
|
||||
--model Qwen/Qwen2.5-7B-Instruct \
|
||||
@@ -98,6 +99,7 @@ pip install evalscope[perf] -U
|
||||
### Basic usage
|
||||
|
||||
You can use `evalscope perf` run perf test:
|
||||
|
||||
```
|
||||
evalscope perf \
|
||||
--url "http://localhost:8000/v1/chat/completions" \
|
||||
@@ -111,7 +113,7 @@ evalscope perf \
|
||||
|
||||
### Output results
|
||||
|
||||
After 1-2 mins, the output is as shown below:
|
||||
After 1-2 mins, the output is as shown below:
|
||||
|
||||
```shell
|
||||
Benchmarking summary:
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
# Using lm-eval
|
||||
This document will guide you have a accuracy testing using [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness).
|
||||
|
||||
## 1. Run docker container
|
||||
## 1. Run docker container
|
||||
|
||||
You can run docker container on a single NPU:
|
||||
|
||||
@@ -36,6 +36,7 @@ Install lm-eval in the container.
|
||||
```bash
|
||||
pip install lm-eval
|
||||
```
|
||||
|
||||
Run the following command:
|
||||
|
||||
```
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Using OpenCompass
|
||||
# Using OpenCompass
|
||||
This document will guide you have a accuracy testing using [OpenCompass](https://github.com/open-compass/opencompass).
|
||||
|
||||
## 1. Online Serving
|
||||
@@ -29,7 +29,9 @@ docker run --rm \
|
||||
-it $IMAGE \
|
||||
vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240
|
||||
```
|
||||
|
||||
If your service start successfully, you can see the info shown below:
|
||||
|
||||
```
|
||||
INFO: Started server process [6873]
|
||||
INFO: Waiting for application startup.
|
||||
@@ -37,6 +39,7 @@ INFO: Application startup complete.
|
||||
```
|
||||
|
||||
Once your server is started, you can query the model with input prompts in new terminal:
|
||||
|
||||
```
|
||||
curl http://localhost:8000/v1/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
|
||||
Reference in New Issue
Block a user