[Doc] Update Qwen model accuracy report

2025-12-10 17:55:27 +08:00
parent ec935627cb
commit bd66cfa6c2
3 changed files with 55 additions and 0 deletions
--- a/docs/source/developer_guide/evaluation/accuracy_report/Qwen2.5-32B.md
+++ b/docs/source/developer_guide/evaluation/accuracy_report/Qwen2.5-32B.md
@@ -0,0 +1,19 @@
+# Qwen2.5-32B
+
+* vLLM Version: vLLM: 0.10.1.1 , vLLM-KunLun Version: v0.10.1.1
+* Software Environment:OS: Ubuntu 22.04, PyTorch ≥ 2.5.1
+* Hardware Environment: KunLun P800
+* Parallel mode:TP4
+
+```bash
+-----------+--------------------------+------------------+------+--------+---------+
+| Dataset   | Metric                   | Subset           | Num  | Score  | Cat.0   |
+-----------+--------------------------+------------------+------+--------+---------+
+| gsm8k     | mean_acc                 | main             | 1319 | 0.9158 | default |
+| humaneval | pass@1                   | openai_humaneval |  164 | 0.878  | default |
+| ifeval    | mean_prompt_level_strict | default          |  541 | 0.8059 | default |
+| ifeval    | mean_inst_level_strict   | default          |  541 | 0.8765 | default |
+| ifeval    | mean_prompt_level_loose  | default          |  541 | 0.8262 | default |
+| ifeval    | mean_inst_level_loose    | default          |  541 | 0.8916 | default |
+-----------+--------------------------+------------------+------+--------+---------+
+```
--- a/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-30B-A3B-coder.md
+++ b/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-30B-A3B-coder.md
@@ -0,0 +1,16 @@
+# Qwen3-30B-A3B-coder
+
+* vLLM Version: vLLM: 0.10.1.1 , vLLM-KunLun Version: v0.10.1.1
+* Software Environment:OS: Ubuntu 22.04, PyTorch ≥ 2.5.1
+* Hardware Environment: KunLun P800
+* Parallel mode:TP4
+
+```bash
+-----------------+-------------+--------------------+------+--------+---------+
+| Dataset         | Metric      | Subset             | Num  | Score  | Cat.0   |
+-----------------+-------------+--------------------+------+--------+---------+
+| gsm8k           | mean_acc    | main               | 1319 | 0.9272 | default |
+| humaneval       | pass@1      | openai_humaneval   | 164  | 0.9146 | default |
+| live_code_bench | pass@1      | release_latest     | 714  | 0.5644 | default |
+-----------------+-------------+--------------------+------+--------+---------+
+```
--- a/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-8B.md
+++ b/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-8B.md
@@ -0,0 +1,20 @@
+# Qwen3-8B
+
+* vLLM Version: vLLM: 0.10.1.1 , vLLM-KunLun Version: v0.10.1.1
+* Software Environment:OS: Ubuntu 22.04, PyTorch ≥ 2.5.1
+* Hardware Environment: KunLun P800
+* Parallel mode:TP1
+
+```bash
+-----------+--------------------------+--------------------+------+--------+---------+
+| Dataset   | Metric                   | Subset             | Num  | Score  | Cat.0   |
+-----------+--------------------------+--------------------+------+--------+---------+
+| gsm8k     | mean_acc                 | main               | 1319 | 0.9143 | default |
+| humaneval | pass@1                   | openai_humaneval   | 164  | 0.8049 | default |
+| ifeval    | mean_prompt_level_strict | default            | 541  | 0.8503 | default |
+| ifeval    | mean_inst_level_strict   | default            | 541  | 0.8971 | default |
+| ifeval    | mean_prompt_level_loose  | default            | 541  | 0.8762 | default |
+| ifeval    | mean_inst_level_loose    | default            | 541  | 0.9165 | default |
+| math_500  | mean_acc                 | Level 1            | 43   | 0.907  | default |
+-----------+--------------------------+--------------------+------+--------+---------+
+```