Files

herizhen 0d1424d81a [Doc][Misc] Comprehensive documentation cleanup and grammatical fixes (#8073 )

What this PR does / why we need it?
This pull request performs a comprehensive cleanup of the vLLM Ascend
documentation. It fixes numerous typos, grammatical errors, and phrasing
issues across community guidelines, developer documents, hardware
tutorials, and feature guides. Key improvements include correcting
hardware names (e.g., Atlas 300I), fixing broken links, cleaning up code
examples (removing duplicate flags and trailing commas), and improving
the clarity of technical explanations. These changes are necessary to
ensure the documentation is professional, accurate, and easy for users
to follow.

Does this PR introduce any user-facing change?
No, this PR contains documentation-only updates.

How was this patch tested?
The changes were manually reviewed for accuracy and grammatical
correctness. No functional code changes were introduced.

---------

Signed-off-by: herizhen <1270637059@qq.com>
Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>

2026-04-09 15:37:57 +08:00

3.6 KiB

Raw Blame History

Using OpenCompass

This document guides you to conduct accuracy testing using OpenCompass.

1. Online Server

You can run a docker container to start the vLLM server on a single NPU:

   :substitutions:
# Update DEVICE according to your device (/dev/davinci[0-7])
export DEVICE=/dev/davinci7
# Update the vllm-ascend image
export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
docker run --rm \
--name vllm-ascend \
--shm-size=1g \
--device $DEVICE \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root/.cache:/root/.cache \
-p 8000:8000 \
-e VLLM_USE_MODELSCOPE=True \
-e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \
-it $IMAGE \
vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240

The vLLM server is started successfully, if you see information as below:

INFO:     Started server process [6873]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

Once your server is started, you can query the model with input prompts in a new terminal.

curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Qwen/Qwen2.5-7B-Instruct",
        "prompt": "The future of AI is",
        "max_completion_tokens": 7,
        "temperature": 0
    }'

2. Run C-Eval using OpenCompass for accuracy testing

Install OpenCompass and configure the environment variables in the container:

# Pin Python 3.10 due to:
# https://github.com/open-compass/opencompass/issues/1976
conda create -n opencompass python=3.10
conda activate opencompass
pip install opencompass modelscope[framework]
export DATASET_SOURCE=ModelScope
git clone https://github.com/open-compass/opencompass.git

Add the following content to opencompass/configs/eval_vllm_ascend_demo.py:

from mmengine.config import read_base
from opencompass.models import OpenAISDK

with read_base():
    from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets

# Only test ceval-computer_network dataset in this demo
datasets = ceval_datasets[:1]

api_meta_template = dict(
    round=[
        dict(role='HUMAN', api_role='HUMAN'),
        dict(role='BOT', api_role='BOT', generate=True),
    ],
    reserved_roles=[dict(role='SYSTEM', api_role='SYSTEM')],
)

models = [
    dict(
        abbr='Qwen2.5-7B-Instruct-vLLM-API',
        type=OpenAISDK,
        key='EMPTY', # API key
        openai_api_base='http://127.0.0.1:8000/v1', 
        path='Qwen/Qwen2.5-7B-Instruct', 
        tokenizer_path='Qwen/Qwen2.5-7B-Instruct', 
        rpm_verbose=True, 
        meta_template=api_meta_template,
        query_per_second=1, 
        max_out_len=1024, 
        max_seq_len=4096, 
        temperature=0.01, 
        batch_size=8,
        retry=3,
    )
]

Run the following command:

python3 run.py opencompass/configs/eval_vllm_ascend_demo.py --debug

After 1 to 2 minutes, the output is shown below:

The markdown format results are as below:

| dataset | version | metric | mode | Qwen2.5-7B-Instruct-vLLM-API |
|----- | ----- | ----- | ----- | -----|
| ceval-computer_network | db9ce2 | accuracy | gen | 68.42 |

You can see more usage on OpenCompass Docs.

3.6 KiB Raw Blame History

Using OpenCompass

1. Online Server

2. Run C-Eval using OpenCompass for accuracy testing

3.6 KiB

Raw Blame History