xc-llm-ascend/docs/source/developer_guide/evaluation/using_lm_eval.md

# Using lm-eval
This document will guide you have a accuracy testing using [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness).

## 1. Run docker container

You can run docker container on a single NPU:

```{code-block} bash
   :substitutions:
# Update DEVICE according to your device (/dev/davinci[0-7])
export DEVICE=/dev/davinci7
# Update the vllm-ascend image
export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
docker run --rm \
--name vllm-ascend \
--device $DEVICE \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root/.cache:/root/.cache \
-p 8000:8000 \
-e VLLM_USE_MODELSCOPE=True \
-e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \
-it $IMAGE \
/bin/bash
```

## 2. Run ceval accuracy test using lm-eval
Install lm-eval in the container.

```bash
pip install lm-eval
```

Run the following command:

```
# Only test ceval-valid-computer_network dataset in this demo
lm_eval \
  --model vllm \
  --model_args pretrained=Qwen/Qwen2.5-7B-Instruct,max_model_len=4096,block_size=4,tensor_parallel_size=1 \
  --tasks ceval-valid_computer_network \
  --batch_size 8
```

After 1-2 mins, the output is as shown below:

```
The markdown format results is as below:

|           Tasks            |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|----------------------------|------:|------|-----:|--------|---|-----:|---|-----:|
|ceval-valid_computer_network|      2|none  |     0|acc     |↑  |0.6842|±  |0.1096|
|                            |       |none  |     0|acc_norm|↑  |0.6842|±  |0.1096|

```

You can see more usage on [Lm-eval Docs](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/README.md).
[Doc]Add developer guide for using lm-eval (#456) ### What this PR does / why we need it? Add developer guide for using lm-eval ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test manually --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-04-01 23:43:51 +08:00			`# Using lm-eval`
			`This document will guide you have a accuracy testing using [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness).`

[1/2/N] Enable pymarkdown and python __init__ for lint system (#2011) ### What this PR does / why we need it? 1. Enable pymarkdown check 2. Enable python `__init__.py` check for vllm and vllm-ascend 3. Make clean code ### How was this patch tested? - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/29c6fbe58cfa705c26ed1b38f262d5ade0b4f9ba --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-07-25 22:16:10 +08:00			`## 1. Run docker container`
[Doc]Add developer guide for using lm-eval (#456) ### What this PR does / why we need it? Add developer guide for using lm-eval ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test manually --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-04-01 23:43:51 +08:00
			`You can run docker container on a single NPU:`

			```{code-block} bash
			`:substitutions:`
			`# Update DEVICE according to your device (/dev/davinci[0-7])`
			`export DEVICE=/dev/davinci7`
			`# Update the vllm-ascend image`
			`export IMAGE=quay.io/ascend/vllm-ascend:\|vllm_ascend_version\|`
			`docker run --rm \`
			`--name vllm-ascend \`
			`--device $DEVICE \`
			`--device /dev/davinci_manager \`
			`--device /dev/devmm_svm \`
			`--device /dev/hisi_hdc \`
			`-v /usr/local/dcmi:/usr/local/dcmi \`
			`-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \`
			`-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \`
			`-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \`
			`-v /etc/ascend_install.info:/etc/ascend_install.info \`
			`-v /root/.cache:/root/.cache \`
			`-p 8000:8000 \`
			`-e VLLM_USE_MODELSCOPE=True \`
			`-e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \`
			`-it $IMAGE \`
			`/bin/bash`
			```

			`## 2. Run ceval accuracy test using lm-eval`
			`Install lm-eval in the container.`

			```bash
			`pip install lm-eval`
			```
[1/2/N] Enable pymarkdown and python __init__ for lint system (#2011) ### What this PR does / why we need it? 1. Enable pymarkdown check 2. Enable python `__init__.py` check for vllm and vllm-ascend 3. Make clean code ### How was this patch tested? - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/29c6fbe58cfa705c26ed1b38f262d5ade0b4f9ba --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-07-25 22:16:10 +08:00
[Doc]Add developer guide for using lm-eval (#456) ### What this PR does / why we need it? Add developer guide for using lm-eval ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test manually --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-04-01 23:43:51 +08:00			`Run the following command:`

			```
			`# Only test ceval-valid-computer_network dataset in this demo`
			`lm_eval \`
			`--model vllm \`
			`--model_args pretrained=Qwen/Qwen2.5-7B-Instruct,max_model_len=4096,block_size=4,tensor_parallel_size=1 \`
			`--tasks ceval-valid_computer_network \`
			`--batch_size 8`
			```

			`After 1-2 mins, the output is as shown below:`

			```
			`The markdown format results is as below:`

			`\| Tasks \|Version\|Filter\|n-shot\| Metric \| \|Value \| \|Stderr\|`
			`\|----------------------------\|------:\|------\|-----:\|--------\|---\|-----:\|---\|-----:\|`
			`\|ceval-valid_computer_network\| 2\|none \| 0\|acc \|↑ \|0.6842\|± \|0.1096\|`
			`\| \| \|none \| 0\|acc_norm\|↑ \|0.6842\|± \|0.1096\|`

			```

			`You can see more usage on [Lm-eval Docs](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/README.md).`