[Doc] Update doc (#3836)
### What this PR does / why we need it? Update doc ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.1 Signed-off-by: hfadzxy <starmoon_zhang@163.com>
This commit is contained in:
@@ -2,7 +2,7 @@
|
||||
|
||||
This document will guide you have model inference stress testing and accuracy testing using [EvalScope](https://github.com/modelscope/evalscope).
|
||||
|
||||
## 1. Online serving
|
||||
## 1. Online server
|
||||
|
||||
You can run docker container to start the vLLM server on a single NPU:
|
||||
|
||||
@@ -32,7 +32,7 @@ docker run --rm \
|
||||
vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240
|
||||
```
|
||||
|
||||
If your service start successfully, you can see the info shown below:
|
||||
If the vLLM server is started successfully, you can see information shown below:
|
||||
|
||||
```
|
||||
INFO: Started server process [6873]
|
||||
@@ -40,7 +40,7 @@ INFO: Waiting for application startup.
|
||||
INFO: Application startup complete.
|
||||
```
|
||||
|
||||
Once your server is started, you can query the model with input prompts in new terminal:
|
||||
Once your server is started, you can query the model with input prompts in a new terminal:
|
||||
|
||||
```
|
||||
curl http://localhost:8000/v1/completions \
|
||||
@@ -55,7 +55,7 @@ curl http://localhost:8000/v1/completions \
|
||||
|
||||
## 2. Install EvalScope using pip
|
||||
|
||||
You can install EvalScope by using:
|
||||
You can install EvalScope as follows:
|
||||
|
||||
```bash
|
||||
python3 -m venv .venv-evalscope
|
||||
@@ -63,9 +63,9 @@ source .venv-evalscope/bin/activate
|
||||
pip install gradio plotly evalscope
|
||||
```
|
||||
|
||||
## 3. Run gsm8k accuracy test using EvalScope
|
||||
## 3. Run GSM8K using EvalScope for accuracy testing
|
||||
|
||||
You can `evalscope eval` run gsm8k accuracy test:
|
||||
You can use `evalscope eval` to run GSM8K for accuracy testing:
|
||||
|
||||
```
|
||||
evalscope eval \
|
||||
@@ -77,7 +77,7 @@ evalscope eval \
|
||||
--limit 10
|
||||
```
|
||||
|
||||
After 1-2 mins, the output is as shown below:
|
||||
After 1 to 2 minutes, the output is shown below:
|
||||
|
||||
```shell
|
||||
+---------------------+-----------+-----------------+----------+-------+---------+---------+
|
||||
@@ -87,7 +87,7 @@ After 1-2 mins, the output is as shown below:
|
||||
+---------------------+-----------+-----------------+----------+-------+---------+---------+
|
||||
```
|
||||
|
||||
See more detail in: [EvalScope doc - Model API Service Evaluation](https://evalscope.readthedocs.io/en/latest/get_started/basic_usage.html#model-api-service-evaluation).
|
||||
See more detail in [EvalScope doc - Model API Service Evaluation](https://evalscope.readthedocs.io/en/latest/get_started/basic_usage.html#model-api-service-evaluation).
|
||||
|
||||
## 4. Run model inference stress testing using EvalScope
|
||||
|
||||
@@ -99,7 +99,7 @@ pip install evalscope[perf] -U
|
||||
|
||||
### Basic usage
|
||||
|
||||
You can use `evalscope perf` run perf test:
|
||||
You can use `evalscope perf` to run perf testing:
|
||||
|
||||
```
|
||||
evalscope perf \
|
||||
@@ -114,7 +114,7 @@ evalscope perf \
|
||||
|
||||
### Output results
|
||||
|
||||
After 1-2 mins, the output is as shown below:
|
||||
After 1 to 2 minutes, the output is shown below:
|
||||
|
||||
```shell
|
||||
Benchmarking summary:
|
||||
@@ -173,4 +173,4 @@ Percentile results:
|
||||
+------------+----------+---------+-------------+--------------+---------------+----------------------+
|
||||
```
|
||||
|
||||
See more detail in: [EvalScope doc - Model Inference Stress Testing](https://evalscope.readthedocs.io/en/latest/user_guides/stress_test/quick_start.html#basic-usage).
|
||||
See more detail in [EvalScope doc - Model Inference Stress Testing](https://evalscope.readthedocs.io/en/latest/user_guides/stress_test/quick_start.html#basic-usage).
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
# Using lm-eval
|
||||
This document will guide you have a accuracy testing using [lm-eval][1].
|
||||
This document guides you to conduct accuracy testing using [lm-eval][1].
|
||||
|
||||
## Online Server
|
||||
### 1. start the vLLM server
|
||||
### 1. Start the vLLM server
|
||||
You can run docker container to start the vLLM server on a single NPU:
|
||||
|
||||
```{code-block} bash
|
||||
@@ -32,7 +32,7 @@ docker run --rm \
|
||||
vllm serve Qwen/Qwen2.5-0.5B-Instruct --max_model_len 4096 &
|
||||
```
|
||||
|
||||
Started the vLLM server successfully,if you see log as below:
|
||||
The vLLM server is started successfully, if you see logs as below:
|
||||
|
||||
```
|
||||
INFO: Started server process [9446]
|
||||
@@ -40,9 +40,9 @@ INFO: Waiting for application startup.
|
||||
INFO: Application startup complete.
|
||||
```
|
||||
|
||||
### 2. Run gsm8k accuracy test using lm-eval
|
||||
### 2. Run GSM8K using lm-eval for accuracy testing
|
||||
|
||||
You can query result with input prompts:
|
||||
You can query the result with input prompts:
|
||||
|
||||
```
|
||||
curl http://localhost:8000/v1/completions \
|
||||
@@ -99,7 +99,7 @@ The output format matches the following:
|
||||
}
|
||||
```
|
||||
|
||||
Install lm-eval in the container.
|
||||
Install lm-eval in the container:
|
||||
|
||||
```bash
|
||||
export HF_ENDPOINT="https://hf-mirror.com"
|
||||
@@ -117,7 +117,7 @@ lm_eval \
|
||||
--output_path ./
|
||||
```
|
||||
|
||||
After 30 mins, the output is as shown below:
|
||||
After 30 minutes, the output is as shown below:
|
||||
|
||||
```
|
||||
The markdown format results is as below:
|
||||
@@ -160,8 +160,8 @@ docker run --rm \
|
||||
/bin/bash
|
||||
```
|
||||
|
||||
### 2. Run gsm8k accuracy test using lm-eval
|
||||
Install lm-eval in the container.
|
||||
### 2. Run GSM8K using lm-eval for accuracy testing
|
||||
Install lm-eval in the container:
|
||||
|
||||
```bash
|
||||
export HF_ENDPOINT="https://hf-mirror.com"
|
||||
@@ -179,7 +179,7 @@ lm_eval \
|
||||
--batch_size auto
|
||||
```
|
||||
|
||||
After 1-2 mins, the output is as shown below:
|
||||
After 1 to 2 minutes, the output is shown below:
|
||||
|
||||
```
|
||||
The markdown format results is as below:
|
||||
@@ -191,9 +191,9 @@ Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|
||||
|
||||
```
|
||||
|
||||
## Use offline Datasets
|
||||
## Use Offline Datasets
|
||||
|
||||
Take gsm8k(single dataset) and mmlu(multi-subject dataset) as examples, and you can see more from [here][2].
|
||||
Take GSM8K (single dataset) and MMLU (multi-subject dataset) as examples, and you can see more from [here][2].
|
||||
|
||||
```bash
|
||||
# set HF_DATASETS_OFFLINE when using offline datasets
|
||||
@@ -207,7 +207,7 @@ cd lm_eval/tasks/gsm8k
|
||||
cd lm_eval/tasks/mmlu/default
|
||||
```
|
||||
|
||||
set [gsm8k.yaml][3] as follows:
|
||||
Set [gsm8k.yaml][3] as follows:
|
||||
|
||||
```yaml
|
||||
tag:
|
||||
@@ -232,7 +232,7 @@ training_split: train
|
||||
fewshot_split: train
|
||||
test_split: test
|
||||
doc_to_text: 'Q: {{question}}
|
||||
A(Please follow the summarize the result at the end with the format of "The answer is xxx", where xx is the result.):'
|
||||
A(Please follow the summarized result at the end with the format of "The answer is xxx", where xx is the result.):'
|
||||
doc_to_target: "{{answer}}" #" {{answer.split('### ')[-1].rstrip()}}"
|
||||
metric_list:
|
||||
- metric: exact_match
|
||||
@@ -270,7 +270,7 @@ metadata:
|
||||
version: 3.0
|
||||
```
|
||||
|
||||
set [_default_template_yaml][4] as follows:
|
||||
Set [_default_template_yaml][4] as follows:
|
||||
|
||||
```yaml
|
||||
# set dataset_path according to the downloaded dataset
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
# Using OpenCompass
|
||||
This document will guide you have a accuracy testing using [OpenCompass](https://github.com/open-compass/opencompass).
|
||||
This document guides you to conduct accuracy testing using [OpenCompass](https://github.com/open-compass/opencompass).
|
||||
|
||||
## 1. Online Serving
|
||||
## 1. Online Server
|
||||
|
||||
You can run docker container to start the vLLM server on a single NPU:
|
||||
|
||||
@@ -31,7 +31,7 @@ docker run --rm \
|
||||
vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240
|
||||
```
|
||||
|
||||
If your service start successfully, you can see the info shown below:
|
||||
The vLLM server is started successfully, if you see information as below:
|
||||
|
||||
```
|
||||
INFO: Started server process [6873]
|
||||
@@ -39,7 +39,7 @@ INFO: Waiting for application startup.
|
||||
INFO: Application startup complete.
|
||||
```
|
||||
|
||||
Once your server is started, you can query the model with input prompts in new terminal:
|
||||
Once your server is started, you can query the model with input prompts in a new terminal.
|
||||
|
||||
```
|
||||
curl http://localhost:8000/v1/completions \
|
||||
@@ -52,8 +52,8 @@ curl http://localhost:8000/v1/completions \
|
||||
}'
|
||||
```
|
||||
|
||||
## 2. Run ceval accuracy test using OpenCompass
|
||||
Install OpenCompass and configure the environment variables in the container.
|
||||
## 2. Run C-Eval using OpenCompass for accuracy testing
|
||||
Install OpenCompass and configure the environment variables in the container:
|
||||
|
||||
```bash
|
||||
# Pin Python 3.10 due to:
|
||||
@@ -65,7 +65,7 @@ export DATASET_SOURCE=ModelScope
|
||||
git clone https://github.com/open-compass/opencompass.git
|
||||
```
|
||||
|
||||
Add `opencompass/configs/eval_vllm_ascend_demo.py` with the following content:
|
||||
Add the following content to `opencompass/configs/eval_vllm_ascend_demo.py`:
|
||||
|
||||
```python
|
||||
from mmengine.config import read_base
|
||||
@@ -111,7 +111,7 @@ Run the following command:
|
||||
python3 run.py opencompass/configs/eval_vllm_ascend_demo.py --debug
|
||||
```
|
||||
|
||||
After 1-2 mins, the output is as shown below:
|
||||
After 1 to 2 minutes, the output is shown below:
|
||||
|
||||
```
|
||||
The markdown format results is as below:
|
||||
|
||||
Reference in New Issue
Block a user