What this PR does / why we need it? This pull request performs a comprehensive cleanup of the vLLM Ascend documentation. It fixes numerous typos, grammatical errors, and phrasing issues across community guidelines, developer documents, hardware tutorials, and feature guides. Key improvements include correcting hardware names (e.g., Atlas 300I), fixing broken links, cleaning up code examples (removing duplicate flags and trailing commas), and improving the clarity of technical explanations. These changes are necessary to ensure the documentation is professional, accurate, and easy for users to follow. Does this PR introduce any user-facing change? No, this PR contains documentation-only updates. How was this patch tested? The changes were manually reviewed for accuracy and grammatical correctness. No functional code changes were introduced. --------- Signed-off-by: herizhen <1270637059@qq.com> Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
127 lines
3.6 KiB
Markdown
127 lines
3.6 KiB
Markdown
# Using OpenCompass
|
|
|
|
This document guides you to conduct accuracy testing using [OpenCompass](https://github.com/open-compass/opencompass).
|
|
|
|
## 1. Online Server
|
|
|
|
You can run a docker container to start the vLLM server on a single NPU:
|
|
|
|
```{code-block} bash
|
|
:substitutions:
|
|
# Update DEVICE according to your device (/dev/davinci[0-7])
|
|
export DEVICE=/dev/davinci7
|
|
# Update the vllm-ascend image
|
|
export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
|
|
docker run --rm \
|
|
--name vllm-ascend \
|
|
--shm-size=1g \
|
|
--device $DEVICE \
|
|
--device /dev/davinci_manager \
|
|
--device /dev/devmm_svm \
|
|
--device /dev/hisi_hdc \
|
|
-v /usr/local/dcmi:/usr/local/dcmi \
|
|
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
|
|
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
|
|
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
|
|
-v /etc/ascend_install.info:/etc/ascend_install.info \
|
|
-v /root/.cache:/root/.cache \
|
|
-p 8000:8000 \
|
|
-e VLLM_USE_MODELSCOPE=True \
|
|
-e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \
|
|
-it $IMAGE \
|
|
vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240
|
|
```
|
|
|
|
The vLLM server is started successfully, if you see information as below:
|
|
|
|
```shell
|
|
INFO: Started server process [6873]
|
|
INFO: Waiting for application startup.
|
|
INFO: Application startup complete.
|
|
```
|
|
|
|
Once your server is started, you can query the model with input prompts in a new terminal.
|
|
|
|
```shell
|
|
curl http://localhost:8000/v1/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "Qwen/Qwen2.5-7B-Instruct",
|
|
"prompt": "The future of AI is",
|
|
"max_completion_tokens": 7,
|
|
"temperature": 0
|
|
}'
|
|
```
|
|
|
|
## 2. Run C-Eval using OpenCompass for accuracy testing
|
|
|
|
Install OpenCompass and configure the environment variables in the container:
|
|
|
|
```bash
|
|
# Pin Python 3.10 due to:
|
|
# https://github.com/open-compass/opencompass/issues/1976
|
|
conda create -n opencompass python=3.10
|
|
conda activate opencompass
|
|
pip install opencompass modelscope[framework]
|
|
export DATASET_SOURCE=ModelScope
|
|
git clone https://github.com/open-compass/opencompass.git
|
|
```
|
|
|
|
Add the following content to `opencompass/configs/eval_vllm_ascend_demo.py`:
|
|
|
|
```python
|
|
from mmengine.config import read_base
|
|
from opencompass.models import OpenAISDK
|
|
|
|
with read_base():
|
|
from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
|
|
|
|
# Only test ceval-computer_network dataset in this demo
|
|
datasets = ceval_datasets[:1]
|
|
|
|
api_meta_template = dict(
|
|
round=[
|
|
dict(role='HUMAN', api_role='HUMAN'),
|
|
dict(role='BOT', api_role='BOT', generate=True),
|
|
],
|
|
reserved_roles=[dict(role='SYSTEM', api_role='SYSTEM')],
|
|
)
|
|
|
|
models = [
|
|
dict(
|
|
abbr='Qwen2.5-7B-Instruct-vLLM-API',
|
|
type=OpenAISDK,
|
|
key='EMPTY', # API key
|
|
openai_api_base='http://127.0.0.1:8000/v1',
|
|
path='Qwen/Qwen2.5-7B-Instruct',
|
|
tokenizer_path='Qwen/Qwen2.5-7B-Instruct',
|
|
rpm_verbose=True,
|
|
meta_template=api_meta_template,
|
|
query_per_second=1,
|
|
max_out_len=1024,
|
|
max_seq_len=4096,
|
|
temperature=0.01,
|
|
batch_size=8,
|
|
retry=3,
|
|
)
|
|
]
|
|
```
|
|
|
|
Run the following command:
|
|
|
|
```shell
|
|
python3 run.py opencompass/configs/eval_vllm_ascend_demo.py --debug
|
|
```
|
|
|
|
After 1 to 2 minutes, the output is shown below:
|
|
|
|
```shell
|
|
The markdown format results are as below:
|
|
|
|
| dataset | version | metric | mode | Qwen2.5-7B-Instruct-vLLM-API |
|
|
|----- | ----- | ----- | ----- | -----|
|
|
| ceval-computer_network | db9ce2 | accuracy | gen | 68.42 |
|
|
```
|
|
|
|
You can see more usage on [OpenCompass Docs](https://opencompass.readthedocs.io/en/latest/index.html).
|