diff --git a/docs/source/developer_guide/evaluation/accuracy_report/Qwen2.5-32B.md b/docs/source/developer_guide/evaluation/accuracy_report/Qwen2.5-32B.md new file mode 100644 index 0000000..ff15c37 --- /dev/null +++ b/docs/source/developer_guide/evaluation/accuracy_report/Qwen2.5-32B.md @@ -0,0 +1,19 @@ +# Qwen2.5-32B + +* vLLM Version: vLLM: 0.10.1.1 , vLLM-KunLun Version: v0.10.1.1 +* Software Environment:OS: Ubuntu 22.04, PyTorch ≥ 2.5.1 +* Hardware Environment: KunLun P800 +* Parallel mode:TP4 + +```bash ++-----------+--------------------------+------------------+------+--------+---------+ +| Dataset | Metric | Subset | Num | Score | Cat.0 | ++-----------+--------------------------+------------------+------+--------+---------+ +| gsm8k | mean_acc | main | 1319 | 0.9158 | default | +| humaneval | pass@1 | openai_humaneval | 164 | 0.878 | default | +| ifeval | mean_prompt_level_strict | default | 541 | 0.8059 | default | +| ifeval | mean_inst_level_strict | default | 541 | 0.8765 | default | +| ifeval | mean_prompt_level_loose | default | 541 | 0.8262 | default | +| ifeval | mean_inst_level_loose | default | 541 | 0.8916 | default | ++-----------+--------------------------+------------------+------+--------+---------+ +``` \ No newline at end of file diff --git a/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-30B-A3B-coder.md b/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-30B-A3B-coder.md new file mode 100644 index 0000000..7368e7b --- /dev/null +++ b/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-30B-A3B-coder.md @@ -0,0 +1,16 @@ +# Qwen3-30B-A3B-coder + +* vLLM Version: vLLM: 0.10.1.1 , vLLM-KunLun Version: v0.10.1.1 +* Software Environment:OS: Ubuntu 22.04, PyTorch ≥ 2.5.1 +* Hardware Environment: KunLun P800 +* Parallel mode:TP4 + +```bash ++-----------------+-------------+--------------------+------+--------+---------+ +| Dataset | Metric | Subset | Num | Score | Cat.0 | ++-----------------+-------------+--------------------+------+--------+---------+ +| gsm8k | mean_acc | main | 1319 | 0.9272 | default | +| humaneval | pass@1 | openai_humaneval | 164 | 0.9146 | default | +| live_code_bench | pass@1 | release_latest | 714 | 0.5644 | default | ++-----------------+-------------+--------------------+------+--------+---------+ +``` \ No newline at end of file diff --git a/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-8B.md b/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-8B.md new file mode 100644 index 0000000..55340c2 --- /dev/null +++ b/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-8B.md @@ -0,0 +1,20 @@ +# Qwen3-8B + +* vLLM Version: vLLM: 0.10.1.1 , vLLM-KunLun Version: v0.10.1.1 +* Software Environment:OS: Ubuntu 22.04, PyTorch ≥ 2.5.1 +* Hardware Environment: KunLun P800 +* Parallel mode:TP1 + +```bash ++-----------+--------------------------+--------------------+------+--------+---------+ +| Dataset | Metric | Subset | Num | Score | Cat.0 | ++-----------+--------------------------+--------------------+------+--------+---------+ +| gsm8k | mean_acc | main | 1319 | 0.9143 | default | +| humaneval | pass@1 | openai_humaneval | 164 | 0.8049 | default | +| ifeval | mean_prompt_level_strict | default | 541 | 0.8503 | default | +| ifeval | mean_inst_level_strict | default | 541 | 0.8971 | default | +| ifeval | mean_prompt_level_loose | default | 541 | 0.8762 | default | +| ifeval | mean_inst_level_loose | default | 541 | 0.9165 | default | +| math_500 | mean_acc | Level 1 | 43 | 0.907 | default | ++-----------+--------------------------+--------------------+------+--------+---------+ +``` \ No newline at end of file