diff --git a/docs/source/developer_guide/evaluation/accuracy_report/DeepSeek-V2-Lite.md b/docs/source/developer_guide/evaluation/accuracy_report/DeepSeek-V2-Lite.md new file mode 100644 index 0000000..68d4369 --- /dev/null +++ b/docs/source/developer_guide/evaluation/accuracy_report/DeepSeek-V2-Lite.md @@ -0,0 +1,20 @@ +# deepseek-ai/DeepSeek-V2-Lite + +- **vLLM Version**: vLLM: 0.10.1.1 ([1da94e6](https://github.com/vllm-project/vllm/commit/1da94e6)), **vLLM Ascend Version**: v0.10.1rc1 ([7e16b4a](https://github.com/vllm-project/vllm-ascend/commit/7e16b4a)) +- **Software Environment**: **CANN**: 8.2.RC1, **PyTorch**: 2.7.1, **torch-npu**: 2.7.1.dev20250724 +- **Hardware Environment**: Atlas A2 Series +- **Parallel mode**: TP2 +- **Execution mode**: ACLGraph + +**Command**: + +```bash +export MODEL_ARGS='pretrained=deepseek-ai/DeepSeek-V2-Lite,tensor_parallel_size=2,dtype=auto,trust_remote_code=True,max_model_len=4096,enforce_eager=True' +lm_eval --model vllm --model_args $MODEL_ARGS --tasks gsm8k \ + --batch_size auto +``` + +| Task | Metric | Value | Stderr | +|-----------------------|-------------|----------:|-------:| +| gsm8k | exact_match,strict-match | ✅0.3813 | ± 0.0134 | +| gsm8k | exact_match,flexible-extract | ✅0.3836 | ± 0.0134 | diff --git a/docs/source/developer_guide/evaluation/accuracy_report/Qwen2.5-VL-7B-Instruct.md b/docs/source/developer_guide/evaluation/accuracy_report/Qwen2.5-VL-7B-Instruct.md new file mode 100644 index 0000000..6ceff53 --- /dev/null +++ b/docs/source/developer_guide/evaluation/accuracy_report/Qwen2.5-VL-7B-Instruct.md @@ -0,0 +1,19 @@ +# Qwen/Qwen2.5-VL-7B-Instruct + +- **vLLM Version**: vLLM: 0.10.1.1 ([1da94e6](https://github.com/vllm-project/vllm/commit/1da94e6)), **vLLM Ascend Version**: v0.10.1rc1 ([7e16b4a](https://github.com/vllm-project/vllm-ascend/commit/7e16b4a)) +- **Software Environment**: **CANN**: 8.2.RC1, **PyTorch**: 2.7.1, **torch-npu**: 2.7.1.dev20250724 +- **Hardware Environment**: Atlas A2 Series +- **Parallel mode**: TP1 +- **Execution mode**: ACLGraph + +**Command**: + +```bash +export MODEL_ARGS='pretrained=Qwen/Qwen2.5-VL-7B-Instruct,tensor_parallel_size=1,dtype=auto,trust_remote_code=False,max_model_len=8192' +lm_eval --model vllm-vlm --model_args $MODEL_ARGS --tasks mmmu_val \ + --apply_chat_template True --fewshot_as_multiturn True --batch_size auto +``` + +| Task | Metric | Value | Stderr | +|-----------------------|-------------|----------:|-------:| +| mmmu_val | acc,none | ✅0.52 | ± 0.0162 | diff --git a/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-30B-A3B.md b/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-30B-A3B.md new file mode 100644 index 0000000..d170936 --- /dev/null +++ b/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-30B-A3B.md @@ -0,0 +1,21 @@ +# Qwen/Qwen3-30B-A3B + +- **vLLM Version**: vLLM: 0.10.1.1 ([1da94e6](https://github.com/vllm-project/vllm/commit/1da94e6)), **vLLM Ascend Version**: v0.10.1rc1 ([7e16b4a](https://github.com/vllm-project/vllm-ascend/commit/7e16b4a)) +- **Software Environment**: **CANN**: 8.2.RC1, **PyTorch**: 2.7.1, **torch-npu**: 2.7.1.dev20250724 +- **Hardware Environment**: Atlas A2 Series +- **Parallel mode**: TP2 + EP +- **Execution mode**: ACLGraph + +**Command**: + +```bash +export MODEL_ARGS='pretrained=Qwen/Qwen3-30B-A3B,tensor_parallel_size=2,dtype=auto,trust_remote_code=False,max_model_len=4096,gpu_memory_utilization=0.6,enable_expert_parallel=True' +lm_eval --model vllm --model_args $MODEL_ARGS --tasks gsm8k,ceval-valid \ + --num_fewshot 5 --batch_size auto +``` + +| Task | Metric | Value | Stderr | +|-----------------------|-------------|----------:|-------:| +| gsm8k | exact_match,strict-match | ✅0.8923 | ± 0.0085 | +| gsm8k | exact_match,flexible-extract | ✅0.8506 | ± 0.0098 | +| ceval-valid | acc,none | ✅0.8358 | ± 0.0099 | diff --git a/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-8B-Base.md b/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-8B-Base.md new file mode 100644 index 0000000..0649ee6 --- /dev/null +++ b/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-8B-Base.md @@ -0,0 +1,21 @@ +# Qwen/Qwen3-8B-Base + +- **vLLM Version**: vLLM: 0.10.1.1 ([1da94e6](https://github.com/vllm-project/vllm/commit/1da94e6)), **vLLM Ascend Version**: v0.10.1rc1 ([7e16b4a](https://github.com/vllm-project/vllm-ascend/commit/7e16b4a)) +- **Software Environment**: **CANN**: 8.2.RC1, **PyTorch**: 2.7.1, **torch-npu**: 2.7.1.dev20250724 +- **Hardware Environment**: Atlas A2 Series +- **Parallel mode**: TP1 +- **Execution mode**: ACLGraph + +**Command**: + +```bash +export MODEL_ARGS='pretrained=Qwen/Qwen3-8B-Base,tensor_parallel_size=1,dtype=auto,trust_remote_code=False,max_model_len=4096' +lm_eval --model vllm --model_args $MODEL_ARGS --tasks gsm8k,ceval-valid \ + --apply_chat_template True --fewshot_as_multiturn True --num_fewshot 5 --batch_size auto +``` + +| Task | Metric | Value | Stderr | +|-----------------------|-------------|----------:|-------:| +| gsm8k | exact_match,strict-match | ✅0.8271 | ± 0.0104 | +| gsm8k | exact_match,flexible-extract | ✅0.8294 | ± 0.0104 | +| ceval-valid | acc,none | ✅0.815 | ± 0.0103 | diff --git a/docs/source/developer_guide/evaluation/accuracy_report/index.md b/docs/source/developer_guide/evaluation/accuracy_report/index.md index 0ed0a18..59f7f23 100644 --- a/docs/source/developer_guide/evaluation/accuracy_report/index.md +++ b/docs/source/developer_guide/evaluation/accuracy_report/index.md @@ -3,4 +3,8 @@ :::{toctree} :caption: Accuracy Report :maxdepth: 1 +DeepSeek-V2-Lite +Qwen2.5-VL-7B-Instruct +Qwen3-30B-A3B +Qwen3-8B-Base :::