From 94bf9c379e7b9a39f23d00c2c4b16af384ea15de Mon Sep 17 00:00:00 2001
From: hfadzxy <59153331+hfadzxy@users.noreply.github.com>
Date: Tue, 1 Apr 2025 23:43:51 +0800
Subject: [PATCH] [Doc]Add developer guide for using lm-eval (#456)

### What this PR does / why we need it?
Add developer guide for using lm-eval

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
test manually

---------

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
---
 .../developer_guide/evaluation/index.md       |  1 +
 .../evaluation/using_lm_eval.md               | 62 +++++++++++++++++++
 2 files changed, 63 insertions(+)
 create mode 100644 docs/source/developer_guide/evaluation/using_lm_eval.md

diff --git a/docs/source/developer_guide/evaluation/index.md b/docs/source/developer_guide/evaluation/index.md
index 03f1551..12364c3 100644
--- a/docs/source/developer_guide/evaluation/index.md
+++ b/docs/source/developer_guide/evaluation/index.md
@@ -4,4 +4,5 @@
 :caption: Accuracy
 :maxdepth: 1
 using_opencompass
+using_lm_eval
 :::
\ No newline at end of file
diff --git a/docs/source/developer_guide/evaluation/using_lm_eval.md b/docs/source/developer_guide/evaluation/using_lm_eval.md
new file mode 100644
index 0000000..1523314
--- /dev/null
+++ b/docs/source/developer_guide/evaluation/using_lm_eval.md
@@ -0,0 +1,62 @@
+# Using lm-eval
+This document will guide you have a accuracy testing using [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness).
+
+##  1. Run docker container
+
+You can run docker container on a single NPU:
+
+```{code-block} bash
+   :substitutions:
+# Update DEVICE according to your device (/dev/davinci[0-7])
+export DEVICE=/dev/davinci7
+# Update the vllm-ascend image
+export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
+docker run --rm \
+--name vllm-ascend \
+--device $DEVICE \
+--device /dev/davinci_manager \
+--device /dev/devmm_svm \
+--device /dev/hisi_hdc \
+-v /usr/local/dcmi:/usr/local/dcmi \
+-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
+-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
+-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
+-v /etc/ascend_install.info:/etc/ascend_install.info \
+-v /root/.cache:/root/.cache \
+-p 8000:8000 \
+-e VLLM_USE_MODELSCOPE=True \
+-e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \
+-it $IMAGE \
+/bin/bash
+```
+
+## 2. Run ceval accuracy test using lm-eval
+Install lm-eval in the container.
+
+```bash
+pip install lm-eval
+```
+Run the following command:
+
+```
+# Only test ceval-valid-computer_network dataset in this demo
+lm_eval \
+  --model vllm \
+  --model_args pretrained=Qwen/Qwen2.5-7B-Instruct,max_model_len=4096,block_size=4,tensor_parallel_size=1 \
+  --tasks ceval-valid_computer_network \
+  --batch_size 8
+```
+
+After 1-2 mins, the output is as shown below:
+
+```
+The markdown format results is as below:
+
+|           Tasks            |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
+|----------------------------|------:|------|-----:|--------|---|-----:|---|-----:|
+|ceval-valid_computer_network|      2|none  |     0|acc     |↑  |0.6842|±  |0.1096|
+|                            |       |none  |     0|acc_norm|↑  |0.6842|±  |0.1096|
+
+```
+
+You can see more usage on [Lm-eval Docs](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/README.md).