[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
bde38c11df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
This commit is contained in:
SILONG ZENG
2026-01-15 09:06:01 +08:00
committed by GitHub
parent 96edd4673f
commit 4811ba62e0
75 changed files with 711 additions and 308 deletions

View File

@@ -1,8 +1,11 @@
# Using AISBench
This document guides you to conduct accuracy testing using [AISBench](https://gitee.com/aisbench/benchmark/tree/master). AISBench provides accuracy and performance evaluation for many datasets.
## Online Server
### 1. Start the vLLM server
You can run docker container to start the vLLM server on a single NPU:
```{code-block} bash
@@ -44,7 +47,7 @@ vllm serve Qwen/Qwen2.5-0.5B-Instruct --max_model_len 35000 &
The vLLM server is started successfully, if you see logs as below:
```
```shell
INFO: Started server process [9446]
INFO: Waiting for application startup.
INFO: Application startup complete.
@@ -220,7 +223,7 @@ ais_bench --models vllm_api_general_chat --datasets aime2024_gen_0_shot_chat_pro
After each dataset execution, you can get the result from saved files such as `outputs/default/20250628_151326`, there is an example as follows:
```
```shell
20250628_151326/
├── configs # Combined configuration file for model tasks, dataset tasks, and result presentation tasks
│ └── 20250628_151326_29317.py
@@ -276,7 +279,7 @@ ais_bench --models vllm_api_stream_chat --datasets textvqa_gen_base64 --summariz
After execution, you can get the result from saved files, there is an example as follows:
```
```shell
20251031_070226/
|-- configs # Combined configuration file for model tasks, dataset tasks, and result presentation tasks
| `-- 20251031_070226_122485.py

View File

@@ -34,7 +34,7 @@ vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240
If the vLLM server is started successfully, you can see information shown below:
```
```shell
INFO: Started server process [6873]
INFO: Waiting for application startup.
INFO: Application startup complete.
@@ -42,7 +42,7 @@ INFO: Application startup complete.
Once your server is started, you can query the model with input prompts in a new terminal:
```
```shell
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
@@ -67,7 +67,7 @@ pip install gradio plotly evalscope
You can use `evalscope eval` to run GSM8K for accuracy testing:
```
```shell
evalscope eval \
--model Qwen/Qwen2.5-7B-Instruct \
--api-url http://localhost:8000/v1 \
@@ -101,7 +101,7 @@ pip install evalscope[perf] -U
You can use `evalscope perf` to run perf testing:
```
```shell
evalscope perf \
--url "http://localhost:8000/v1/chat/completions" \
--parallel 5 \

View File

@@ -1,8 +1,11 @@
# Using lm-eval
This document guides you to conduct accuracy testing using [lm-eval][1].
## Online Server
### 1. Start the vLLM server
You can run docker container to start the vLLM server on a single NPU:
```{code-block} bash
@@ -34,7 +37,7 @@ vllm serve Qwen/Qwen2.5-0.5B-Instruct --max_model_len 4096 &
The vLLM server is started successfully, if you see logs as below:
```
```shell
INFO: Started server process [9446]
INFO: Waiting for application startup.
INFO: Application startup complete.
@@ -44,7 +47,7 @@ INFO: Application startup complete.
You can query the result with input prompts:
```
```shell
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
@@ -71,7 +74,7 @@ curl http://localhost:8000/v1/completions \
The output format matches the following:
```
```json
{
"id": "cmpl-2f678e8bdf5a4b209a3f2c1fa5832e25",
"object": "text_completion",
@@ -108,7 +111,7 @@ pip install lm-eval[api]
Run the following command:
```
```shell
# Only test gsm8k dataset in this demo
lm_eval \
--model local-completions \
@@ -119,7 +122,7 @@ lm_eval \
After 30 minutes, the output is as shown below:
```
```shell
The markdown format results is as below:
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
@@ -130,6 +133,7 @@ The markdown format results is as below:
```
## Offline Server
### 1. Run docker container
You can run docker container on a single NPU:
@@ -161,6 +165,7 @@ docker run --rm \
```
### 2. Run GSM8K using lm-eval for accuracy testing
Install lm-eval in the container:
```bash
@@ -170,7 +175,7 @@ pip install lm-eval
Run the following command:
```
```shell
# Only test gsm8k dataset in this demo
lm_eval \
--model vllm \
@@ -181,7 +186,7 @@ lm_eval \
After 1 to 2 minutes, the output is shown below:
```
```shell
The markdown format results is as below:
Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|

View File

@@ -1,4 +1,5 @@
# Using OpenCompass
This document guides you to conduct accuracy testing using [OpenCompass](https://github.com/open-compass/opencompass).
## 1. Online Server
@@ -33,7 +34,7 @@ vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240
The vLLM server is started successfully, if you see information as below:
```
```shell
INFO: Started server process [6873]
INFO: Waiting for application startup.
INFO: Application startup complete.
@@ -41,7 +42,7 @@ INFO: Application startup complete.
Once your server is started, you can query the model with input prompts in a new terminal.
```
```shell
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
@@ -53,6 +54,7 @@ curl http://localhost:8000/v1/completions \
```
## 2. Run C-Eval using OpenCompass for accuracy testing
Install OpenCompass and configure the environment variables in the container:
```bash
@@ -107,13 +109,13 @@ models = [
Run the following command:
```
```shell
python3 run.py opencompass/configs/eval_vllm_ascend_demo.py --debug
```
After 1 to 2 minutes, the output is shown below:
```
```shell
The markdown format results is as below:
| dataset | version | metric | mode | Qwen2.5-7B-Instruct-vLLM-API |