Initial commit for vLLM-Kunlun Plugin

This commit is contained in:
dongxinyu03
2025-12-10 12:05:39 +08:00
commit c728e52505
131 changed files with 28816 additions and 0 deletions

View File

@@ -0,0 +1,271 @@
## Operator accuracy test
### torch_xray
torch_xray is an operator precision analysis tool that can dump module-level input-output precision comparisons and automatically construct operator unit tests.
#### 1.Download and install
***\*python3.10:\****
bos:/klx-sdk-release-public/xpytorch/dev_kl3/torch_xray/latest/torch_xray-999.9.9-cp310-cp310-linux_x86_64.whl
[https://su.bcebos.com/klx-sdk-release-public/xpytorch/dev_kl3/torch_xray/latest/](https://su.bcebos.com/klx-sdk-release-public/xpytorch/dev_kl3/torch_xray/latest/torch_xray-999.9.9-py3-none-any.whl)torch_xray-999.9.9-cp310-cp310-linux_x86_64.whl
***\*python3.8:\****
bos:/klx-sdk-release-public/xpytorch/dev_kl3/torch_xray/latest/torch_xray-999.9.9-cp38-cp38-linux_x86_64.whl
[https://su.bcebos.com/klx-sdk-release-public/xpytorch/dev_kl3/torch_xray/latest/](https://su.bcebos.com/klx-sdk-release-public/xpytorch/dev_kl3/torch_xray/latest/torch_xray-999.9.9-py3-none-any.whl)torch_xray-999.9.9-cp38-cp38-linux_x86_64.whl
Note that the same installation package must be used when using it in different environments.
#### 2.Use
##### Dump module-level inputs and outputs and compare their precision.
Below is a sample code snippet used to dump the input and output of the vision module and compare the errors in the vllm framework.
```bash
from torch_xray import PrecisionDebugger
def execute_model(
self,
scheduler_output: "SchedulerOutput",
intermediate_tensors: Optional[IntermediateTensors] = None,
) -> Union[ModelRunnerOutput, AsyncModelRunnerOutput, IntermediateTensors]:
# dump_path # Path to store dump results
# rank # Rank that needs to be dumped
# step # Setting the inference value to 1 is sufficient.
# model # The module to be dumped must be of type nn.module
debugger = PrecisionDebugger(dump_path="dump-vision", hook_name="dump", rank=[0], step=[1], model=self.model.visual, dump_torch_api=False)
debugger.start()
........
```
The results directory will generate an h5 file and a csv file.
```bash
-rw-r--r-- 1 root root 471231309 Oct 31 13:12 globalrank-0_localrank-0.h5
-rw-r--r-- 1 root root 71 Oct 31 13:11 globalrank-0_localrank-0_summary.csv
```
##### Data processing
```bash
summary xxx.h5 sum.txt
```
The generated h5 file is processed using the summary command to generate a txt file in which the results are presented in tabular form.
```bash
+-------+------+------+-----------------------------------------------------------+-------------+-------------+--------------+-------------+
| Index | Step | Rank | Module | Min | Max | Mean | Std |
+-------+------+------+-----------------------------------------------------------+-------------+-------------+--------------+-------------+
| 0 | 1 | 0 | patch_embed.proj.Conv3d.0.forward_params.weight | -0.0776367 | 0.0795898 | 6.8e-06 | 0.0072608 |
| 1 | 1 | 0 | patch_embed.proj.Conv3d.0.forward_params.bias | -3.046875 | 2.953125 | 0.0113748 | 0.3257138 |
| 2 | 1 | 0 | patch_embed.proj.Conv3d.0.forward_input.0 | -0.7490234 | 0.7021484 | 0.3302804 | 0.2339017 |
| 3 | 1 | 0 | patch_embed.proj.Conv3d.0.forward_output.0 | -4.0078125 | 5.1210938 | 0.0147052 | 0.3815643 |
| 4 | 1 | 0 | pos_embed.Embedding.0.forward_params.weight | -13.8125 | 20.25 | 0.0010043 | 0.2428094 |
| 5 | 1 | 0 | pos_embed.Embedding.0.forward_input.0 | 0.0 | 2303.0 | 1153.9191895 | 714.594360 |
| 6 | 1 | 0 | pos_embed.Embedding.0.forward_output.0 | -13.8125 | 20.25 | 0.0007552 | 0.2643428 |
| 7 | 1 | 0 | rotary_pos_emb.Qwen2_5_VisionRotaryEmbedding.0.forward... | 0.0 | 25.0 | 1.7337022 | 3.9271674 |
| 8 | 1 | 0 | blocks.0.norm1.LayerNorm.0.forward_params.weight | -0.5351562 | 3.140625 | 0.4660275 | 0.7907906 |
| 9 | 1 | 0 | blocks.0.norm1.LayerNorm.0.forward_params.bias | -2.359375 | 2.921875 | 0.0013793 | 0.1879374 |
| 10 | 1 | 0 | blocks.0.norm1.LayerNorm.0.forward_input.0 | -15.65625 | 20.21875 | 0.0155256 | 0.4382802 |
| 11 | 1 | 0 | blocks.0.norm1.LayerNorm.0.forward_output.0 | -6.1640625 | 6.7460938 | 0.0006746 | 0.2708515 |
| 12 | 1 | 0 | blocks.0.attn.qkv.QKVParallelLinear.0.forward_params.bias | -6.125 | 6.1875 | -0.0292423 | 0.8602651 |
| 13 | 1 | 0 | blocks.0.attn.qkv.QKVParallelLinear.0.forward_input.0 | -6.1640625 | 6.7460938 | 0.0006746 | 0.2708515 |
| 14 | 1 | 0 | blocks.0.attn.qkv.QKVParallelLinear.0.forward_output.0 | -6.5859375 | 7.6171875 | -0.0125549 | 1.0678084 |
| 15 | 1 | 0 | blocks.0.attn.proj.RowParallelLinear.0.forward_params... | -3.578125 | 3.203125 | -0.0043617 | 0.4846557 |
| 16 | 1 | 0 | blocks.0.attn.proj.RowParallelLinear.0.forward_input.0 | -1.9130859 | 1.4375 | 0.0005577 | 0.0947055 |
| 17 | 1 | 0 | blocks.0.attn.proj.RowParallelLinear.0.forward_output.0 | -9.109375 | 7.3867188 | -0.0034284 | 0.4465481 |
| 18 | 1 | 0 | blocks.0.norm2.LayerNorm.1.forward_params.weight | -0.1376953 | 14.5625 | 1.9166113 | 3.017405 |
| 19 | 1 | 0 | blocks.0.norm2.LayerNorm.1.forward_params.bias | -1.6328125 | 3.84375 | 0.0062865 | 0.2443586 |
| 20 | 1 | 0 | blocks.0.norm2.LayerNorm.1.forward_input.0 | -8.5859375 | 11.109375 | 0.0120974 | 0.4243064 |
| 21 | 1 | 0 | blocks.0.norm2.LayerNorm.1.forward_output.0 | -12.015625 | 14.265625 | -0.0012364 | 0.4973041 |
| 22 | 1 | 0 | blocks.0.mlp.linear_fc1.ColumnParallelLinear.0.forwar... | -9.4375 | 0.7304688 | -2.4200516 | 1.6754951 |
| 23 | 1 | 0 | blocks.0.mlp.linear_fc1.ColumnParallelLinear.0.forwar... | -12.015625 | 14.265625 | -0.0012364 | 0.4973041 |
| 24 | 1 | 0 | blocks.0.mlp.linear_fc1.ColumnParallelLinear.0.forwar... | -12.59375 | 13.0625 | -2.1465943 | 1.8433502 |
| 25 | 1 | 0 | blocks.0.mlp.act_fn.GELU.0.forward_input.0 | -12.59375 | 13.0625 | -2.1465943 | 1.8433502 |
+-------+------+------+-----------------------------------------------------------+-------------+-------------+--------------+-------------+
```
##### Accuracy Comparison
```bash
# The results are stored in result.csv
compare xpu.h5 gpu.h5 result.csv
```
The `compare` command is used to process the H5 files generated on the GPU and XPU, resulting in a CSV file. This CSV file is then downloaded to the local machine and opened with Excel, yielding a result similar to the image below.
If you encounter a "no matched keys" problem, please refer to the instructions at the end of this article for a solution.
##### Example of results
```bash
+-------+--------+-----------------------------------------------------------+--------+-----------+-------------+-------------+--------+
| Index | Status | Module (Bench/Target) | Cosine | RMSE | IsClose (%) | Max Err (t) | GtNum |
+-------+--------+-----------------------------------------------------------+--------+-----------+-------------+-------------+--------+
| 0 | | patch_embed.proj.Conv3d.0.forward_params.weight | 1 | 0 | 100 | 0 | 0 |
| 1 | | patch_embed.proj.Conv3d.0.forward_params.bias | 1 | 0 | 100 | 0 | 0 |
| 2 | | patch_embed.proj.Conv3d.0.forward_input.0 | 1 | 0 | 100 | 0 | 0 |
| 3 | | patch_embed.proj.Conv3d.0.forward_output.0 | 1 | 9.90E-06 | 100 | 0.001953 | 267 |
| 4 | | pos_embed.Embedding.0.forward_params.weight | 1 | 0 | 100 | 0 | 0 |
| 5 | | pos_embed.Embedding.0.forward_input.0 | 1 | 0 | 100 | 0 | 0 |
| 6 | | pos_embed.Embedding.0.forward_output.0 | 1 | 0 | 100 | 0 | 0 |
| 7 | | rotary_pos_emb.Qwen2_5_VisionRotaryEmbedding.0.forward... | 1 | 0 | 100 | 0 | 0 |
| 8 | | blocks.0.norm1.LayerNorm.0.forward_params.weight | 1 | 0 | 100 | 0 | 0 |
| 9 | | blocks.0.norm1.LayerNorm.0.forward_params.bias | 1 | 0 | 100 | 0 | 0 |
| 10 | | blocks.0.norm1.LayerNorm.0.forward_input.0 | 1 | 1.14E-05 | 100 | 0.00390625 | 216 |
| 11 | | blocks.0.norm1.LayerNorm.0.forward_output.0 | 1 | 1.84E-05 | 99.98 | 0.0078125 | 1585 |
| 12 | | blocks.0.attn.qkv.QKVParallelLinear.0.forward_params.bias | 1 | 0 | 100 | 0 | 0 |
| 13 | | blocks.0.attn.qkv.QKVParallelLinear.0.forward_input.0 | 1 | 1.84E-05 | 99.98 | 0.0078125 | 1585 |
| 14 | | blocks.0.attn.qkv.QKVParallelLinear.0.forward_output.0 | 1 | 0.0002776 | 99.53 | 0.00390625 | 119074 |
| 15 | | blocks.0.attn.proj.RowParallelLinear.0.forward_params... | 1 | 0 | 100 | 0 | 0 |
| 16 | | blocks.0.attn.proj.RowParallelLinear.0.forward_input.0 | 1 | 3.40E-05 | 99.07 | 0.0012207 | 52482 |
| 17 | | blocks.0.attn.proj.RowParallelLinear.0.forward_output.0 | 1 | 0.0001283 | 99.07 | 0.00390625 | 50591 |
| 18 | | blocks.0.norm2.LayerNorm.1.forward_params.weight | 1 | 0 | 100 | 0 | 0 |
| 19 | | blocks.0.norm2.LayerNorm.1.forward_params.bias | 1 | 0 | 100 | 0 | 0 |
| 20 | | blocks.0.norm2.LayerNorm.1.forward_input.0 | 1 | 0.0001437 | 99.01 | 0.0039062 | 31376 |
| 21 | Fail | blocks.0.norm2.LayerNorm.1.forward_output.0 | 1 | 0.0002779 | 98.72 | 0.015625 | 40770 |
| 22 | | blocks.0.mlp.linear_fc1.ColumnParallelLinear.0.forward... | 1 | 0 | 100 | 0 | 0 |
| 23 | Fail | blocks.0.mlp.linear_fc1.ColumnParallelLinear.0.forward... | 1 | 0.0002779 | 98.72 | 0.015625 | 40770 |
| 24 | | blocks.0.mlp.linear_fc1.ColumnParallelLinear.0.forward... | 1 | 0.000779 | 98.67 | 0.0078125 | 196313 |
| 25 | | blocks.0.mlp.act_fn.GELU.0.forward_input.0 | 1 | 0.000779 | 98.67 | 0.0078125 | 196313 |
| 26 | | blocks.0.mlp.act_fn.GELU.0.forward_output.0 | 1 | 0.0001012 | 98.08 | 0.0039062 | 153508 |
+-------+--------+-----------------------------------------------------------+--------+-----------+-------------+-------------+--------+
```
Generally, the main focus is on Min Err/Max Err.
##### Indicator Explanation
To be improved...
#### The dump operator is tested and run.
```bash
X_DEBUG=0x102 # trace operator name、arguments shape、dtype、data_range
X_DEDUP=True # Remove duplicates based on shape and dtype.
X_DUMP_NUM # The default value is 0, meaning no tensor data is saved. Setting it to n means that n parameters are randomly selected from each operator to save the actual parameters.
```
Below is a sample code snippet that dumps information such as the size and dtype of the forward operator of Qwen3_VisionTransformer. During runtime, an xray_debug directory will be automatically created in the current directory to store the dump results.
```bash
from torch_xray import begin_dump, end_dump
.............

class Qwen3_VisionTransformer(nn.Module):

def __init__(
self,
vision_config: Qwen3VLVisionConfig,
norm_eps: float = 1e-6,
quant_config: Optional[QuantizationConfig] = None,
prefix: str = "",
use_data_parallel: bool = False,
) -> None:
super().__init__()
self.hidden_size = vision_config.hidden_size
..........
def forward(
self,
x: torch.Tensor,
grid_thw: list[list[int]],
) -> torch.Tensor:
# Start dump
# X_DEBUG=0x102 # trace operator name、arguments shape、dtype、data_range
# X_DEDUP=True # Remove duplicates based on shape and dtype.
# The default value is 0, meaning no tensor data is saved. Setting it to n means that n parameters are randomly selected from each operator to save the actual parameters.
begin_dump(X_DEBUG=0x102, X_DEDUP=True, X_DUMP_NUM=5)
hidden_states = x.to(device=self.device, dtype=self.dtype)
hidden_states = self.patch_embed(hidden_states)
...........
# End dump
end_dump(clear_context=True)
return hidden_states
```
This is the file directory.
```bash
├── xary_debug/
│ ├── proc_xxx/ # Process-based storage results
│ ├── dump/ # The dumped tensor
│ ├── dump.json # Information needed to generate unit tests, such as input/output size and dtype.
```
##### Generate unit test
jprof --cpu_init --blacklist --factory=load dump.json
Create a pytests directory in the current directory to store unit tests.
##### Run unit test
The GPU only needs to copy the XPU's pytests directory and execute it.
Since the unit test program defaults to finding the actual dumped tensors using relative paths, this step must be performed in the xary_debug/ directory.
```bash
# detail_compare_path stores the unit test results.
pytest --detail_compare_path=./xxx.csv proc_xxx/pytests/ --seed 42
```
##### Results Comparison
```bash
# After obtaining two result CSV files, compare them and generate result.csv.
summary_diff_check ./xpu.csv ./gpu.csv ./result.csv
```
##### Example of results
```bash
+------------+-----------------------+-------------+-------------+-----------+----------+---------+---------+----------+
| name | op_name | dtype | shape | min-val | max-val | is_pass | xpu_max | gpu_max |
+------------+-----------------------+-------------+-------------+-----------+----------+---------+---------+----------+
| 00004-aten | aten.linspace.default | torch.float | [10] | 0 | 47 | pass | 0 | 1.91E-06 |
| 00005-aten | aten.linspace.default | torch.float | [26] | 0 | 47 | pass | 0 | 0 |
| 00027-aten | aten.add.Tensor | torch.int64 | [10, 26] | 0 | 0 | pass | 0 | 0 |
| 00028-aten | aten.add.Tensor | torch.int64 | [10, 26] | 0 | 0 | pass | 0 | 0 |
| 00037-aten | aten.add.Tensor | torch.float | [260, 1152] | -29.09375 | 33.75 | pass | 0 | 0 |
| 00038-aten | aten.add.Tensor | torch.float | [260, 1152] | -27.1875 | 37.625 | pass | 0 | 0 |
| 00047-aten | aten.add.Tensor | torch.float | [260, 1152] | -28.98438 | 42.34375 | pass | 0 | 0 |
| 00082-aten | aten.sub.Tensor | torch.int32 | [1] | 0 | 0 | pass | 0 | 0 |
+------------+-----------------------+-------------+-------------+-----------+----------+---------+---------+----------+
```
The main focus is on the values of gpu_1e-1, xpu_1e-1, etc., which represent the number of elements whose error between the gpu/xpu result and the cpu result exceeds the order of 1e-n. This serves as the primary basis for determining whether there is a problem with the operator's precision.
#### Replenish
##### Bypassing the issue of differing naming conventions between Kunlun Card and GPU modules, which prevents diff calculation.
```bash
#
blocks.0.mlp.linear_fc1.ColumnParallelLinear.0.forward_params.bias
#
blocks.0.mlp.linear_fc1.ColumnParalleLinear.forward_params.bias
```
As shown in the figure above, due to various reasons, the module names dumped by the GPU and XPU are often different, and the compare command cannot be used to identify them directly.
```python
for step in steps: # (['/'] for group creation order h5py >= 3.10.0)
# for bench_key, target_key in get_matched_names(
# list(dump_ben[str(step)].keys()),
# list(dump_tar[str(step)].keys()),
# fuzzy_match,
# ):
for bench_key, target_key in zip(
list(dump_ben[str(step)].keys()),
list(dump_tar[str(step)].keys()),
):
```
Modify torch_xray/compare/compare.py to skip the get_matched_name step. This modification will allow for line-by-line comparison even if module names differ, producing a compare result. However, it's crucial to ensure that the number of rows in the GPU and XPU dumps is consistent.

View File

@@ -0,0 +1,240 @@
## Overall accuracy test
### EvalScope
#### 1.Download and install
EvalScope supports use in Python environments. Users can install EvalScope via pip or from source code. Here are examples of both installation methods:
```bash
#pip
pip install evalscope[perf] -U
#git
git clone https://github.com/modelscope/evalscope.git
cd evalscope
pip install -e '.[perf]'
```
#### 2.Dataset preparation script
```python
from evalscope.collections import CollectionSchema, DatasetInfo, WeightedSampler
from evalscope.utils.io_utils import dump_jsonl_data
import os # Step 1: Import the os module
schema = CollectionSchema(
name="VL-Test",
datasets=[
CollectionSchema(
name="PureText",
weight=1,
datasets=[
DatasetInfo(
name="mmlu_pro",
weight=1,
task_type="exam",
tags=["en"],
args={"few_shot_num": 0},
),
DatasetInfo(
name="ifeval",
weight=1,
task_type="instruction",
tags=["en"],
args={"few_shot_num": 0},
),
DatasetInfo(
name="gsm8k",
weight=1,
task_type="math",
tags=["en"],
args={"few_shot_num": 0},
),
],
),
CollectionSchema(
name="Vision",
weight=2,
datasets=[
DatasetInfo(
name="math_vista",
weight=1,
task_type="math",
tags=["en"],
args={"few_shot_num": 0},
),
DatasetInfo(
name="mmmu_pro",
weight=1,
task_type="exam",
tags=["en"],
args={"few_shot_num": 0},
),
],
),
],
)
# get the mixed data
mixed_data = WeightedSampler(schema).sample(1000)
output_path = "outputs/vl_test.jsonl" # Step 2: Define the output file path
output_dir = os.path.dirname(output_path) # Step 3: Obtain the directory name
if not os.path.exists(output_dir): # Step 4: Check if the directory exists
os.makedirs(output_dir, exist_ok=True) # Step 5: Automatically create directories
# dump the mixed data to a jsonl file
dump_jsonl_data(mixed_data, output_path) # Step 6: Securely write to the file
```
Dataset composition visualization:
```
┌───────────────────────────────────────┐
│ VL-Test (1000 samples) │
├─────────────────┬─────────────────────┤
│ PureText │ Vision │
│ (333 samples) │ (667 samples) │
├─────────────────┼─────────────────────┤
│ • mmlu_pro │ • math_vista │
│ • ifeval │ • mmmu_pro │
│ • gsm8k │ │
└─────────────────┴─────────────────────┘
```
#### 3.Test
```python
from dotenv import dotenv_values
from evalscope import TaskConfig, run_task
from evalscope.constants import EvalType
task_cfg = TaskConfig(
model="Qwen2.5-VL-7B-Instruct",
api_url="http://localhost:8804/v1",
api_key="EMPTY",
eval_type=EvalType.SERVICE,
datasets=[
"data_collection",
],
dataset_args={
"data_collection": {
"local_path": "../outputs/vl_test.jsonl",
}
},
eval_batch_size=5,
generation_config={
"max_tokens": 30000, # The maximum number of tokens that can be generated should be set to a large value to avoid output truncation.
"temperature": 0.6, # Sampling temperature (recommended value from qwen report)
"top_p": 0.95, # top-p sampling (recommended value from qwen report)
"top_k": 20, # Top-k sampling (recommended value from qwen report)
"n": 1, # Number of responses generated per request
"repetition_penalty": 1.0, # 1.0 = Penalty disabled, >1.0 = Penalty repeated.
},
)
run_task(task_cfg=task_cfg)
```
Parameter Tuning Guide:
| Parameter | Current value | Effect | Adjustment suggestions |
| ----------------- | ------------- | ---------------------------------------- | -------------------------------------------------------- |
| `temperature` | 0.6 | Control output diversity | Math problems ↓ 0.3 / Creative writing ↑ 0.9 |
| `top_p` | 0.95 | Filtering low-probability tokens | Reduce "nonsense" |
| `eval_batch_size` | 5 | Number of requests processed in parallel | With sufficient video memory, it can be increased to 10. |
Run the test:
```bash
#!/bin/bash
# ========================================
# Step 1: Set the log file path
# ========================================
LOG_FILE="accuracy_$(date +%Y%m%d_%H%M).log"
# ========================================
# Step 2: Execute the Python script and capture all output
# Meaning of 2>&1:
# - 2 represents standard error output (stderr)
# ->& represents redirection and merging
# - 1 represents standard output (stdout)
# Function: Merges error messages into standard output as well.
# ========================================
python accuracy.py 2>&1 | tee "$LOG_FILE"
# ========================================
# Step 3: Check Execution Status
# ${PIPESTATUS[0]} Get the exit code of the first command (Python) in the pipeline
# ========================================
EXIT_CODE=${PIPESTATUS[0]}
if [ $EXIT_CODE -eq 0 ]; then
echo "✅ Evaluation completed! Log saved to: $LOG_FILE"
else
echo "❌ Evaluation failed! Exit code: $EXIT_CODE Please check the log: $LOG_FILE"
fi
```
#### 4.Common problem fixes
##### 4.1 NLTK resource missing fix
```bash
Resource punkt_tab not found.
```
Solution
```python
import nltk
import os
# Step 1: Set the download path (select a writable directory)
download_dir = "/workspace/myenv/nltk_data"
os.makedirs(download_dir, exist_ok=True)
# Step 2: Configure NLTK data path
nltk.data.path.append(download_dir)
# Step 3: Download necessary resources
print("🔽 Start downloading punkt_tab resource...")
try:
nltk.download("punkt_tab", download_dir=download_dir)
print("✅ Download successful!")
except Exception as e:
print(f"❌ Download failed: {e}")
print("💡 Alternative: Download manually from GitHub")
print(
" URL: https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt_tab.zip"
)
```
repair:
```bash
# Activate environment
source /workspace/myenv/bin/activate
# Run the repair script
python fix_nltk.py
# Rerun the test
bash run_accuracy_test.sh
```
#### 5.Results Display
```bash
+-------------+---------------------+--------------+---------------+-------+
| task_type | metric | dataset_name | average_score | count |
+-------------+---------------------+--------------+---------------+-------+
| exam | acc | mmmu_pro | 0.521 | 334 |
| math | acc | math_vista | 0.6066 | 333 |
| exam | acc | mmlu_pro | 0.5405 | 111 |
| instruction | prompt_level_strict | ifeval | 0.6937 | 111 |
| math | acc | gsm8k | 0.8288 | 111 |
+-------------+---------------------+--------------+---------------+-------+
```

View File

@@ -0,0 +1,10 @@
# Accuracy
This document details the accuracy testing methods for vllm-kunlun and the analysis of the results.
:::{toctree}
:caption: Accuracy
:maxdepth: 1
accuracy_server
accuracy_kernel
:::

View File

@@ -0,0 +1,18 @@
# GLM-Air-4.5
* vLLM Version: vLLM: 0.10.1.1 , vLLM-KunLun Version: v0.10.1.1
* Software Environment:OS: Ubuntu 22.04, PyTorch ≥ 2.5.1
* Hardware Environment: KunLun P800
* Parallel mode:TP8
```bash
+-------------+----------+---------------+---------+-----+--------+---------+
| Model | Dataset | Metric | Subset | Num | Score | Cat.0 |
+-------------+----------+---------------+---------+-----+--------+---------+
| GLM-4.5-Air | math_500 | AveragePass@1 | Level 1 | 43 | 0.9302 | default |
| GLM-4.5-Air | math_500 | AveragePass@1 | Level 2 | 90 | 0.9222 | default |
| GLM-4.5-Air | math_500 | AveragePass@1 | Level 3 | 105 | 0.8762 | default |
| GLM-4.5-Air | math_500 | AveragePass@1 | Level 4 | 128 | 0.8984 | default |
| GLM-4.5-Air | math_500 | AveragePass@1 | Level 5 | 134 | 0.8955 | default |
+-------------+----------+---------------+---------+-----+--------+---------+
```

View File

@@ -0,0 +1,18 @@
# GLM-4.5
* vLLM Version: vLLM: 0.10.1.1 , vLLM-KunLun Version: v0.10.1.1
* Software Environment:OS: Ubuntu 22.04, PyTorch ≥ 2.5.1
* Hardware Environment: KunLun P800
* Parallel mode:TP8
```bash
+---------+----------+---------------+---------+-----+--------+---------+
| Model | Dataset | Metric | Subset | Num | Score | Cat.0 |
+---------+----------+---------------+---------+-----+--------+---------+
| GLM-4.5 | math_500 | AveragePass@1 | Level 1 | 43 | 0.9302 | default |
| GLM-4.5 | math_500 | AveragePass@1 | Level 2 | 90 | 0.8111 | default |
| GLM-4.5 | math_500 | AveragePass@1 | Level 3 | 105 | 0.7143 | default |
| GLM-4.5 | math_500 | AveragePass@1 | Level 4 | 128 | 0.6172 | default |
| GLM-4.5 | math_500 | AveragePass@1 | Level 5 | 134 | 0.5149 | default |
+---------+----------+---------------+---------+-----+--------+---------+
```

View File

@@ -0,0 +1,18 @@
# InternVL3_5-30B-A3B
* vLLM Version: vLLM: 0.10.1.1 , vLLM-KunLun Version: v0.10.1.1
* Software Environment:OS: Ubuntu 22.04, PyTorch ≥ 2.5.1
* Hardware Environment: KunLun P800
* Parallel mode:TP8
```
+-------------+---------------------+--------------+---------------+-------+
| task_type | metric | dataset_name | average_score | count |
+-------------+---------------------+--------------+---------------+-------+
| exam | acc | mmmu_pro | 0.5449 | 334 |
| math | acc | math_vista | 0.6847 | 333 |
| exam | acc | mmlu_pro | 0.6126 | 111 |
| instruction | prompt_level_strict | ifeval | 0.7658 | 111 |
| math | acc | gsm8k | 0.9369 | 111 |
+-------------+---------------------+--------------+---------------+-------+
```

View File

@@ -0,0 +1,18 @@
# Qwen2.5-VL-7B-Instruct
* vLLM Version: vLLM: 0.10.1.1 , vLLM-KunLun Version: v0.10.1.1
* Software Environment:OS: Ubuntu 22.04, PyTorch ≥ 2.5.1
* Hardware Environment: KunLun P800
* Parallel mode:TP1
```
+-------------+---------------------+--------------+---------------+-------+
| task_type | metric | dataset_name | average_score | count |
+-------------+---------------------+--------------+---------------+-------+
| exam | acc | mmmu_pro | 0.521 | 334 |
| math | acc | math_vista | 0.6066 | 333 |
| exam | acc | mmlu_pro | 0.5405 | 111 |
| instruction | prompt_level_strict | ifeval | 0.6937 | 111 |
| math | acc | gsm8k | 0.8288 | 111 |
+-------------+---------------------+--------------+---------------+-------+
```

View File

@@ -0,0 +1,10 @@
# Accuracy Report
:::{toctree}
:caption: Accuracy Report
:maxdepth: 1
Qwen2.5-VL-7B-Instruct
InternVL3_5-30B-A3B
GLM-4.5
GLM-4.5-Air
:::

View File

@@ -0,0 +1,8 @@
# Accuracy
:::{toctree}
:caption: Accuracy
:maxdepth: 1
accuracy/index
accuracy_report/index
:::