[Lint]Style: reformat markdown files via markdownlint (#5884)
### What this PR does / why we need it?
reformat markdown files via markdownlint
- vLLM version: v0.13.0
- vLLM main:
bde38c11df
---------
Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
This commit is contained in:
@@ -1,8 +1,11 @@
|
||||
# Using AISBench
|
||||
|
||||
This document guides you to conduct accuracy testing using [AISBench](https://gitee.com/aisbench/benchmark/tree/master). AISBench provides accuracy and performance evaluation for many datasets.
|
||||
|
||||
## Online Server
|
||||
|
||||
### 1. Start the vLLM server
|
||||
|
||||
You can run docker container to start the vLLM server on a single NPU:
|
||||
|
||||
```{code-block} bash
|
||||
@@ -44,7 +47,7 @@ vllm serve Qwen/Qwen2.5-0.5B-Instruct --max_model_len 35000 &
|
||||
|
||||
The vLLM server is started successfully, if you see logs as below:
|
||||
|
||||
```
|
||||
```shell
|
||||
INFO: Started server process [9446]
|
||||
INFO: Waiting for application startup.
|
||||
INFO: Application startup complete.
|
||||
@@ -220,7 +223,7 @@ ais_bench --models vllm_api_general_chat --datasets aime2024_gen_0_shot_chat_pro
|
||||
|
||||
After each dataset execution, you can get the result from saved files such as `outputs/default/20250628_151326`, there is an example as follows:
|
||||
|
||||
```
|
||||
```shell
|
||||
20250628_151326/
|
||||
├── configs # Combined configuration file for model tasks, dataset tasks, and result presentation tasks
|
||||
│ └── 20250628_151326_29317.py
|
||||
@@ -276,7 +279,7 @@ ais_bench --models vllm_api_stream_chat --datasets textvqa_gen_base64 --summariz
|
||||
|
||||
After execution, you can get the result from saved files, there is an example as follows:
|
||||
|
||||
```
|
||||
```shell
|
||||
20251031_070226/
|
||||
|-- configs # Combined configuration file for model tasks, dataset tasks, and result presentation tasks
|
||||
| `-- 20251031_070226_122485.py
|
||||
|
||||
@@ -34,7 +34,7 @@ vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240
|
||||
|
||||
If the vLLM server is started successfully, you can see information shown below:
|
||||
|
||||
```
|
||||
```shell
|
||||
INFO: Started server process [6873]
|
||||
INFO: Waiting for application startup.
|
||||
INFO: Application startup complete.
|
||||
@@ -42,7 +42,7 @@ INFO: Application startup complete.
|
||||
|
||||
Once your server is started, you can query the model with input prompts in a new terminal:
|
||||
|
||||
```
|
||||
```shell
|
||||
curl http://localhost:8000/v1/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
@@ -67,7 +67,7 @@ pip install gradio plotly evalscope
|
||||
|
||||
You can use `evalscope eval` to run GSM8K for accuracy testing:
|
||||
|
||||
```
|
||||
```shell
|
||||
evalscope eval \
|
||||
--model Qwen/Qwen2.5-7B-Instruct \
|
||||
--api-url http://localhost:8000/v1 \
|
||||
@@ -101,7 +101,7 @@ pip install evalscope[perf] -U
|
||||
|
||||
You can use `evalscope perf` to run perf testing:
|
||||
|
||||
```
|
||||
```shell
|
||||
evalscope perf \
|
||||
--url "http://localhost:8000/v1/chat/completions" \
|
||||
--parallel 5 \
|
||||
|
||||
@@ -1,8 +1,11 @@
|
||||
# Using lm-eval
|
||||
|
||||
This document guides you to conduct accuracy testing using [lm-eval][1].
|
||||
|
||||
## Online Server
|
||||
|
||||
### 1. Start the vLLM server
|
||||
|
||||
You can run docker container to start the vLLM server on a single NPU:
|
||||
|
||||
```{code-block} bash
|
||||
@@ -34,7 +37,7 @@ vllm serve Qwen/Qwen2.5-0.5B-Instruct --max_model_len 4096 &
|
||||
|
||||
The vLLM server is started successfully, if you see logs as below:
|
||||
|
||||
```
|
||||
```shell
|
||||
INFO: Started server process [9446]
|
||||
INFO: Waiting for application startup.
|
||||
INFO: Application startup complete.
|
||||
@@ -44,7 +47,7 @@ INFO: Application startup complete.
|
||||
|
||||
You can query the result with input prompts:
|
||||
|
||||
```
|
||||
```shell
|
||||
curl http://localhost:8000/v1/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
@@ -71,7 +74,7 @@ curl http://localhost:8000/v1/completions \
|
||||
|
||||
The output format matches the following:
|
||||
|
||||
```
|
||||
```json
|
||||
{
|
||||
"id": "cmpl-2f678e8bdf5a4b209a3f2c1fa5832e25",
|
||||
"object": "text_completion",
|
||||
@@ -108,7 +111,7 @@ pip install lm-eval[api]
|
||||
|
||||
Run the following command:
|
||||
|
||||
```
|
||||
```shell
|
||||
# Only test gsm8k dataset in this demo
|
||||
lm_eval \
|
||||
--model local-completions \
|
||||
@@ -119,7 +122,7 @@ lm_eval \
|
||||
|
||||
After 30 minutes, the output is as shown below:
|
||||
|
||||
```
|
||||
```shell
|
||||
The markdown format results is as below:
|
||||
|
||||
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|
||||
@@ -130,6 +133,7 @@ The markdown format results is as below:
|
||||
```
|
||||
|
||||
## Offline Server
|
||||
|
||||
### 1. Run docker container
|
||||
|
||||
You can run docker container on a single NPU:
|
||||
@@ -161,6 +165,7 @@ docker run --rm \
|
||||
```
|
||||
|
||||
### 2. Run GSM8K using lm-eval for accuracy testing
|
||||
|
||||
Install lm-eval in the container:
|
||||
|
||||
```bash
|
||||
@@ -170,7 +175,7 @@ pip install lm-eval
|
||||
|
||||
Run the following command:
|
||||
|
||||
```
|
||||
```shell
|
||||
# Only test gsm8k dataset in this demo
|
||||
lm_eval \
|
||||
--model vllm \
|
||||
@@ -181,7 +186,7 @@ lm_eval \
|
||||
|
||||
After 1 to 2 minutes, the output is shown below:
|
||||
|
||||
```
|
||||
```shell
|
||||
The markdown format results is as below:
|
||||
|
||||
Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
# Using OpenCompass
|
||||
|
||||
This document guides you to conduct accuracy testing using [OpenCompass](https://github.com/open-compass/opencompass).
|
||||
|
||||
## 1. Online Server
|
||||
@@ -33,7 +34,7 @@ vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240
|
||||
|
||||
The vLLM server is started successfully, if you see information as below:
|
||||
|
||||
```
|
||||
```shell
|
||||
INFO: Started server process [6873]
|
||||
INFO: Waiting for application startup.
|
||||
INFO: Application startup complete.
|
||||
@@ -41,7 +42,7 @@ INFO: Application startup complete.
|
||||
|
||||
Once your server is started, you can query the model with input prompts in a new terminal.
|
||||
|
||||
```
|
||||
```shell
|
||||
curl http://localhost:8000/v1/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
@@ -53,6 +54,7 @@ curl http://localhost:8000/v1/completions \
|
||||
```
|
||||
|
||||
## 2. Run C-Eval using OpenCompass for accuracy testing
|
||||
|
||||
Install OpenCompass and configure the environment variables in the container:
|
||||
|
||||
```bash
|
||||
@@ -107,13 +109,13 @@ models = [
|
||||
|
||||
Run the following command:
|
||||
|
||||
```
|
||||
```shell
|
||||
python3 run.py opencompass/configs/eval_vllm_ascend_demo.py --debug
|
||||
```
|
||||
|
||||
After 1 to 2 minutes, the output is shown below:
|
||||
|
||||
```
|
||||
```shell
|
||||
The markdown format results is as below:
|
||||
|
||||
| dataset | version | metric | mode | Qwen2.5-7B-Instruct-vLLM-API |
|
||||
|
||||
Reference in New Issue
Block a user