### What this PR does / why we need it?
This PR fix accuracy test related to
https://github.com/vllm-project/vllm-ascend/pull/2073, users can now
perform accuracy tests on multiple models simultaneously and generate
different report files by running:
```bash
cd ~/vllm-ascend
pytest -sv ./tests/e2e/models/test_lm_eval_correctness.py \
--config-list-file ./tests/e2e/models/configs/accuracy.txt
```
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
<img width="1648" height="511" alt="image"
src="https://github.com/user-attachments/assets/1757e3b8-a6b7-44e5-b701-80940dc756cd"
/>
- vLLM version: v0.10.0
- vLLM main:
766bc8162c
---------
Signed-off-by: Icey <1790571317@qq.com>
25 lines
1.2 KiB
Markdown
25 lines
1.2 KiB
Markdown
# {{ model_name }}
|
|
|
|
**vLLM Version**: vLLM: {{ vllm_version }} ([{{ vllm_commit[:7] }}](https://github.com/vllm-project/vllm/commit/{{ vllm_commit }})),
|
|
**vLLM Ascend Version**: {{ vllm_ascend_version }} ([{{ vllm_ascend_commit[:7] }}](https://github.com/vllm-project/vllm-ascend/commit/{{ vllm_ascend_commit }}))
|
|
**Software Environment**: CANN: {{ cann_version }}, PyTorch: {{ torch_version }}, torch-npu: {{ torch_npu_version }}
|
|
**Hardware Environment**: Atlas A2 Series
|
|
**Datasets**: {{ datasets }}
|
|
**Parallel Mode**: TP
|
|
**Execution Mode**: ACLGraph
|
|
|
|
**Command**:
|
|
|
|
```bash
|
|
export MODEL_ARGS={{ model_args }}
|
|
lm_eval --model {{ model_type }} --model_args $MODEL_ARGS --tasks {{ datasets }} \
|
|
--apply_chat_template {{ apply_chat_template }} --fewshot_as_multiturn {{ fewshot_as_multiturn }} {% if num_fewshot is defined and num_fewshot != "N/A" %} --num_fewshot {{ num_fewshot }} {% endif %} \
|
|
--limit {{ limit }} --batch_size {{ batch_size}}
|
|
```
|
|
|
|
| Task | Metric | Value | Stderr |
|
|
|-----------------------|-------------|----------:|-------:|
|
|
{% for row in rows -%}
|
|
| {{ row.task.rjust(23) }} | {{ row.metric.rjust(15) }} |{{ row.value }} | ± {{ "%.4f" | format(row.stderr | float) }} |
|
|
{% endfor %}
|