[v0.11.0][Doc] Update doc (#3852)
### What this PR does / why we need it? Update doc Signed-off-by: hfadzxy <starmoon_zhang@163.com>
This commit is contained in:
@@ -2,7 +2,7 @@
|
||||
|
||||
This document will guide you have model inference stress testing and accuracy testing using [EvalScope](https://github.com/modelscope/evalscope).
|
||||
|
||||
## 1. Online serving
|
||||
## 1. Online server
|
||||
|
||||
You can run docker container to start the vLLM server on a single NPU:
|
||||
|
||||
@@ -31,7 +31,7 @@ docker run --rm \
|
||||
vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240
|
||||
```
|
||||
|
||||
If your service start successfully, you can see the info shown below:
|
||||
If the vLLM server is started successfully, you can see information shown below:
|
||||
|
||||
```
|
||||
INFO: Started server process [6873]
|
||||
@@ -39,7 +39,7 @@ INFO: Waiting for application startup.
|
||||
INFO: Application startup complete.
|
||||
```
|
||||
|
||||
Once your server is started, you can query the model with input prompts in new terminal:
|
||||
Once your server is started, you can query the model with input prompts in a new terminal:
|
||||
|
||||
```
|
||||
curl http://localhost:8000/v1/completions \
|
||||
@@ -54,7 +54,7 @@ curl http://localhost:8000/v1/completions \
|
||||
|
||||
## 2. Install EvalScope using pip
|
||||
|
||||
You can install EvalScope by using:
|
||||
You can install EvalScope as follows:
|
||||
|
||||
```bash
|
||||
python3 -m venv .venv-evalscope
|
||||
@@ -62,9 +62,9 @@ source .venv-evalscope/bin/activate
|
||||
pip install gradio plotly evalscope
|
||||
```
|
||||
|
||||
## 3. Run gsm8k accuracy test using EvalScope
|
||||
## 3. Run GSM8K using EvalScope for accuracy testing
|
||||
|
||||
You can `evalscope eval` run gsm8k accuracy test:
|
||||
You can use `evalscope eval` to run GSM8K for accuracy testing:
|
||||
|
||||
```
|
||||
evalscope eval \
|
||||
@@ -76,7 +76,7 @@ evalscope eval \
|
||||
--limit 10
|
||||
```
|
||||
|
||||
After 1-2 mins, the output is as shown below:
|
||||
After 1 to 2 minutes, the output is shown below:
|
||||
|
||||
```shell
|
||||
+---------------------+-----------+-----------------+----------+-------+---------+---------+
|
||||
@@ -86,7 +86,7 @@ After 1-2 mins, the output is as shown below:
|
||||
+---------------------+-----------+-----------------+----------+-------+---------+---------+
|
||||
```
|
||||
|
||||
See more detail in: [EvalScope doc - Model API Service Evaluation](https://evalscope.readthedocs.io/en/latest/get_started/basic_usage.html#model-api-service-evaluation).
|
||||
See more detail in [EvalScope doc - Model API Service Evaluation](https://evalscope.readthedocs.io/en/latest/get_started/basic_usage.html#model-api-service-evaluation).
|
||||
|
||||
## 4. Run model inference stress testing using EvalScope
|
||||
|
||||
@@ -98,7 +98,7 @@ pip install evalscope[perf] -U
|
||||
|
||||
### Basic usage
|
||||
|
||||
You can use `evalscope perf` run perf test:
|
||||
You can use `evalscope perf` to run perf testing:
|
||||
|
||||
```
|
||||
evalscope perf \
|
||||
@@ -113,7 +113,7 @@ evalscope perf \
|
||||
|
||||
### Output results
|
||||
|
||||
After 1-2 mins, the output is as shown below:
|
||||
After 1 to 2 minutes, the output is shown below:
|
||||
|
||||
```shell
|
||||
Benchmarking summary:
|
||||
@@ -172,4 +172,4 @@ Percentile results:
|
||||
+------------+----------+---------+-------------+--------------+---------------+----------------------+
|
||||
```
|
||||
|
||||
See more detail in: [EvalScope doc - Model Inference Stress Testing](https://evalscope.readthedocs.io/en/latest/user_guides/stress_test/quick_start.html#basic-usage).
|
||||
See more detail in [EvalScope doc - Model Inference Stress Testing](https://evalscope.readthedocs.io/en/latest/user_guides/stress_test/quick_start.html#basic-usage).
|
||||
|
||||
Reference in New Issue
Block a user