Initial commit for vLLM-Kunlun Plugin

2025-12-10 12:05:39 +08:00
commit c728e52505
131 changed files with 28816 additions and 0 deletions
--- a/docs/source/developer_guide/evaluation/accuracy/accuracy_kernel.md
+++ b/docs/source/developer_guide/evaluation/accuracy/accuracy_kernel.md
@@ -0,0 +1,271 @@
+## Operator accuracy test
+
+### torch_xray
+
+torch_xray is an operator precision analysis tool that can dump module-level input-output precision comparisons and automatically construct operator unit tests.
+
+#### 1.Download and install
+
+***\*python3.10:\****
+
+bos:/klx-sdk-release-public/xpytorch/dev_kl3/torch_xray/latest/torch_xray-999.9.9-cp310-cp310-linux_x86_64.whl
+
+[https://su.bcebos.com/klx-sdk-release-public/xpytorch/dev_kl3/torch_xray/latest/](https://su.bcebos.com/klx-sdk-release-public/xpytorch/dev_kl3/torch_xray/latest/torch_xray-999.9.9-py3-none-any.whl)torch_xray-999.9.9-cp310-cp310-linux_x86_64.whl
+
+***\*python3.8:\****
+
+bos:/klx-sdk-release-public/xpytorch/dev_kl3/torch_xray/latest/torch_xray-999.9.9-cp38-cp38-linux_x86_64.whl
+
+[https://su.bcebos.com/klx-sdk-release-public/xpytorch/dev_kl3/torch_xray/latest/](https://su.bcebos.com/klx-sdk-release-public/xpytorch/dev_kl3/torch_xray/latest/torch_xray-999.9.9-py3-none-any.whl)torch_xray-999.9.9-cp38-cp38-linux_x86_64.whl
+
+Note that the same installation package must be used when using it in different environments.
+
+#### 2.Use
+
+##### Dump module-level inputs and outputs and compare their precision.
+
+Below is a sample code snippet used to dump the input and output of the vision module and compare the errors in the vllm framework.
+
+```bash
+from torch_xray import PrecisionDebugger
+
+def execute_model(
+        self,
+        scheduler_output: "SchedulerOutput",
+        intermediate_tensors: Optional[IntermediateTensors] = None,
+    ) -> Union[ModelRunnerOutput, AsyncModelRunnerOutput, IntermediateTensors]:
+    # dump_path # Path to store dump results
+    # rank # Rank that needs to be dumped
+    # step # Setting the inference value to 1 is sufficient.
+    # model # The module to be dumped must be of type nn.module
+        debugger = PrecisionDebugger(dump_path="dump-vision", hook_name="dump", rank=[0], step=[1], model=self.model.visual, dump_torch_api=False)
+        debugger.start()
+        ........
+```
+
+The results directory will generate an h5 file and a csv file.
+
+```bash
+-rw-r--r-- 1 root root 471231309 Oct 31 13:12 globalrank-0_localrank-0.h5
+-rw-r--r-- 1 root root        71 Oct 31 13:11 globalrank-0_localrank-0_summary.csv
+```
+
+##### Data processing
+
+```bash
+summary xxx.h5 sum.txt
+```
+
+The generated h5 file is processed using the summary command to generate a txt file in which the results are presented in tabular form.
+
+```bash
+-------+------+------+-----------------------------------------------------------+-------------+-------------+--------------+-------------+
+| Index | Step | Rank | Module                                                    |         Min |         Max |         Mean |         Std |
+-------+------+------+-----------------------------------------------------------+-------------+-------------+--------------+-------------+
+|     0 |    1 |    0 | patch_embed.proj.Conv3d.0.forward_params.weight           | -0.0776367  | 0.0795898   |      6.8e-06 | 0.0072608   |
+|     1 |    1 |    0 | patch_embed.proj.Conv3d.0.forward_params.bias             | -3.046875   | 2.953125    |    0.0113748 | 0.3257138   |
+|     2 |    1 |    0 | patch_embed.proj.Conv3d.0.forward_input.0                 | -0.7490234  | 0.7021484   |    0.3302804 | 0.2339017   |
+|     3 |    1 |    0 | patch_embed.proj.Conv3d.0.forward_output.0                | -4.0078125  | 5.1210938   |    0.0147052 | 0.3815643   |
+|     4 |    1 |    0 | pos_embed.Embedding.0.forward_params.weight               | -13.8125    | 20.25       |    0.0010043 | 0.2428094   |
+|     5 |    1 |    0 | pos_embed.Embedding.0.forward_input.0                     |        0.0  | 2303.0      | 1153.9191895 | 714.594360  |
+|     6 |    1 |    0 | pos_embed.Embedding.0.forward_output.0                    | -13.8125    | 20.25       |    0.0007552 | 0.2643428   |
+|     7 |    1 |    0 | rotary_pos_emb.Qwen2_5_VisionRotaryEmbedding.0.forward... |        0.0  | 25.0        |    1.7337022 | 3.9271674   |
+|     8 |    1 |    0 | blocks.0.norm1.LayerNorm.0.forward_params.weight          | -0.5351562  | 3.140625    |    0.4660275 | 0.7907906   |
+|     9 |    1 |    0 | blocks.0.norm1.LayerNorm.0.forward_params.bias            | -2.359375   | 2.921875    |    0.0013793 | 0.1879374   |
+|    10 |    1 |    0 | blocks.0.norm1.LayerNorm.0.forward_input.0                | -15.65625   | 20.21875    |    0.0155256 | 0.4382802   |
+|    11 |    1 |    0 | blocks.0.norm1.LayerNorm.0.forward_output.0               | -6.1640625  | 6.7460938   |    0.0006746 | 0.2708515   |
+|    12 |    1 |    0 | blocks.0.attn.qkv.QKVParallelLinear.0.forward_params.bias | -6.125      | 6.1875      |   -0.0292423 | 0.8602651   |
+|    13 |    1 |    0 | blocks.0.attn.qkv.QKVParallelLinear.0.forward_input.0     | -6.1640625  | 6.7460938   |    0.0006746 | 0.2708515   |
+|    14 |    1 |    0 | blocks.0.attn.qkv.QKVParallelLinear.0.forward_output.0    | -6.5859375  | 7.6171875   |   -0.0125549 | 1.0678084   |
+|    15 |    1 |    0 | blocks.0.attn.proj.RowParallelLinear.0.forward_params...  | -3.578125   | 3.203125    |   -0.0043617 | 0.4846557   |
+|    16 |    1 |    0 | blocks.0.attn.proj.RowParallelLinear.0.forward_input.0    | -1.9130859  | 1.4375      |    0.0005577 | 0.0947055   |
+|    17 |    1 |    0 | blocks.0.attn.proj.RowParallelLinear.0.forward_output.0   | -9.109375   | 7.3867188   |   -0.0034284 | 0.4465481   |
+|    18 |    1 |    0 | blocks.0.norm2.LayerNorm.1.forward_params.weight          | -0.1376953  | 14.5625     |    1.9166113 | 3.017405    |
+|    19 |    1 |    0 | blocks.0.norm2.LayerNorm.1.forward_params.bias            | -1.6328125  | 3.84375     |    0.0062865 | 0.2443586   |
+|    20 |    1 |    0 | blocks.0.norm2.LayerNorm.1.forward_input.0                | -8.5859375  | 11.109375   |    0.0120974 | 0.4243064   |
+|    21 |    1 |    0 | blocks.0.norm2.LayerNorm.1.forward_output.0               | -12.015625  | 14.265625   |   -0.0012364 | 0.4973041   |
+|    22 |    1 |    0 | blocks.0.mlp.linear_fc1.ColumnParallelLinear.0.forwar...  | -9.4375     | 0.7304688   |   -2.4200516 | 1.6754951   |
+|    23 |    1 |    0 | blocks.0.mlp.linear_fc1.ColumnParallelLinear.0.forwar...  | -12.015625  | 14.265625   |   -0.0012364 | 0.4973041   |
+|    24 |    1 |    0 | blocks.0.mlp.linear_fc1.ColumnParallelLinear.0.forwar...  | -12.59375   | 13.0625     |   -2.1465943 | 1.8433502   |
+|    25 |    1 |    0 | blocks.0.mlp.act_fn.GELU.0.forward_input.0                | -12.59375   | 13.0625     |   -2.1465943 | 1.8433502   |
+-------+------+------+-----------------------------------------------------------+-------------+-------------+--------------+-------------+
+```
+
+##### Accuracy Comparison
+
+```bash
+# The results are stored in result.csv
+compare xpu.h5 gpu.h5 result.csv
+```
+
+The `compare` command is used to process the H5 files generated on the GPU and XPU, resulting in a CSV file. This CSV file is then downloaded to the local machine and opened with Excel, yielding a result similar to the image below.
+
+If you encounter a "no matched keys" problem, please refer to the instructions at the end of this article for a solution.
+
+
+##### Example of results
+
+```bash
+-------+--------+-----------------------------------------------------------+--------+-----------+-------------+-------------+--------+
+| Index | Status | Module (Bench/Target)                                     | Cosine |      RMSE | IsClose (%) | Max Err (t) |  GtNum |
+-------+--------+-----------------------------------------------------------+--------+-----------+-------------+-------------+--------+
+|     0 |        | patch_embed.proj.Conv3d.0.forward_params.weight           |      1 |         0 |         100 |           0 |      0 |
+|     1 |        | patch_embed.proj.Conv3d.0.forward_params.bias             |      1 |         0 |         100 |           0 |      0 |
+|     2 |        | patch_embed.proj.Conv3d.0.forward_input.0                 |      1 |         0 |         100 |           0 |      0 |
+|     3 |        | patch_embed.proj.Conv3d.0.forward_output.0                |      1 |  9.90E-06 |         100 |    0.001953 |    267 |
+|     4 |        | pos_embed.Embedding.0.forward_params.weight               |      1 |         0 |         100 |           0 |      0 |
+|     5 |        | pos_embed.Embedding.0.forward_input.0                     |      1 |         0 |         100 |           0 |      0 |
+|     6 |        | pos_embed.Embedding.0.forward_output.0                    |      1 |         0 |         100 |           0 |      0 |
+|     7 |        | rotary_pos_emb.Qwen2_5_VisionRotaryEmbedding.0.forward... |      1 |         0 |         100 |           0 |      0 |
+|     8 |        | blocks.0.norm1.LayerNorm.0.forward_params.weight          |      1 |         0 |         100 |           0 |      0 |
+|     9 |        | blocks.0.norm1.LayerNorm.0.forward_params.bias            |      1 |         0 |         100 |           0 |      0 |
+|    10 |        | blocks.0.norm1.LayerNorm.0.forward_input.0                |      1 |  1.14E-05 |         100 |  0.00390625 |    216 |
+|    11 |        | blocks.0.norm1.LayerNorm.0.forward_output.0               |      1 |  1.84E-05 |       99.98 |   0.0078125 |   1585 |
+|    12 |        | blocks.0.attn.qkv.QKVParallelLinear.0.forward_params.bias |      1 |         0 |         100 |           0 |      0 |
+|    13 |        | blocks.0.attn.qkv.QKVParallelLinear.0.forward_input.0     |      1 |  1.84E-05 |       99.98 |   0.0078125 |   1585 |
+|    14 |        | blocks.0.attn.qkv.QKVParallelLinear.0.forward_output.0    |      1 | 0.0002776 |       99.53 |  0.00390625 | 119074 |
+|    15 |        | blocks.0.attn.proj.RowParallelLinear.0.forward_params...  |      1 |         0 |         100 |           0 |      0 |
+|    16 |        | blocks.0.attn.proj.RowParallelLinear.0.forward_input.0    |      1 |  3.40E-05 |       99.07 |   0.0012207 |  52482 |
+|    17 |        | blocks.0.attn.proj.RowParallelLinear.0.forward_output.0   |      1 | 0.0001283 |       99.07 |  0.00390625 |  50591 |
+|    18 |        | blocks.0.norm2.LayerNorm.1.forward_params.weight          |      1 |         0 |         100 |           0 |      0 |
+|    19 |        | blocks.0.norm2.LayerNorm.1.forward_params.bias            |      1 |         0 |         100 |           0 |      0 |
+|    20 |        | blocks.0.norm2.LayerNorm.1.forward_input.0                |      1 | 0.0001437 |       99.01 |   0.0039062 |  31376 |
+|    21 |   Fail | blocks.0.norm2.LayerNorm.1.forward_output.0               |      1 | 0.0002779 |       98.72 |    0.015625 |  40770 |
+|    22 |        | blocks.0.mlp.linear_fc1.ColumnParallelLinear.0.forward... |      1 |         0 |         100 |           0 |      0 |
+|    23 |   Fail | blocks.0.mlp.linear_fc1.ColumnParallelLinear.0.forward... |      1 | 0.0002779 |       98.72 |    0.015625 |  40770 |
+|    24 |        | blocks.0.mlp.linear_fc1.ColumnParallelLinear.0.forward... |      1 | 0.000779  |       98.67 |   0.0078125 | 196313 |
+|    25 |        | blocks.0.mlp.act_fn.GELU.0.forward_input.0                |      1 | 0.000779  |       98.67 |   0.0078125 | 196313 |
+|    26 |        | blocks.0.mlp.act_fn.GELU.0.forward_output.0               |      1 | 0.0001012 |       98.08 |   0.0039062 | 153508 |
+-------+--------+-----------------------------------------------------------+--------+-----------+-------------+-------------+--------+
+```
+
+Generally, the main focus is on Min Err/Max Err.
+
+##### Indicator Explanation
+
+To be improved...
+
+#### The dump operator is tested and run.
+
+```bash
+X_DEBUG=0x102 # trace operator name、arguments shape、dtype、data_range
+X_DEDUP=True # Remove duplicates based on shape and dtype. 
+X_DUMP_NUM # The default value is 0, meaning no tensor data is saved. Setting it to n means that n parameters are randomly selected from each operator to save the actual parameters.
+```
+
+Below is a sample code snippet that dumps information such as the size and dtype of the forward operator of Qwen3_VisionTransformer. During runtime, an xray_debug directory will be automatically created in the current directory to store the dump results.
+
+```bash
+from torch_xray import begin_dump, end_dump
+.............
+
+class Qwen3_VisionTransformer(nn.Module):
+
+    def __init__(
+        self,
+        vision_config: Qwen3VLVisionConfig,
+        norm_eps: float = 1e-6,
+        quant_config: Optional[QuantizationConfig] = None,
+        prefix: str = "",
+        use_data_parallel: bool = False,
+    ) -> None:
+        super().__init__()
+        self.hidden_size = vision_config.hidden_size
+        ..........
+    def forward(
+        self,
+        x: torch.Tensor,
+        grid_thw: list[list[int]],
+    ) -> torch.Tensor:
+        # Start dump 
+        # X_DEBUG=0x102 # trace operator name、arguments shape、dtype、data_range
+        # X_DEDUP=True # Remove duplicates based on shape and dtype.
+        # The default value is 0, meaning no tensor data is saved. Setting it to n means that n parameters are randomly selected from each operator to save the actual parameters.
+        begin_dump(X_DEBUG=0x102, X_DEDUP=True, X_DUMP_NUM=5)
+        
+        hidden_states = x.to(device=self.device, dtype=self.dtype)
+        hidden_states = self.patch_embed(hidden_states)
+        ...........
+        
+        # End dump
+        end_dump(clear_context=True)
+        return hidden_states
+```
+This is the file directory.
+```bash
+├── xary_debug/                
+│   ├── proc_xxx/     # Process-based storage results
+│       ├── dump/     # The dumped tensor
+│       ├── dump.json # Information needed to generate unit tests, such as input/output size and dtype.
+```
+
+##### Generate unit test
+
+jprof --cpu_init --blacklist --factory=load dump.json
+
+Create a pytests directory in the current directory to store unit tests.
+
+##### Run unit test
+
+The GPU only needs to copy the XPU's pytests directory and execute it.
+
+Since the unit test program defaults to finding the actual dumped tensors using relative paths, this step must be performed in the xary_debug/ directory.
+
+```bash
+# detail_compare_path stores the unit test results.
+pytest --detail_compare_path=./xxx.csv proc_xxx/pytests/ --seed 42
+```
+
+##### Results Comparison
+
+```bash
+# After obtaining two result CSV files, compare them and generate result.csv.
+summary_diff_check  ./xpu.csv ./gpu.csv ./result.csv
+```
+
+##### Example of results
+
+```bash
+------------+-----------------------+-------------+-------------+-----------+----------+---------+---------+----------+
+| name       | op_name               | dtype       | shape       |   min-val |  max-val | is_pass | xpu_max |  gpu_max |
+------------+-----------------------+-------------+-------------+-----------+----------+---------+---------+----------+
+| 00004-aten | aten.linspace.default | torch.float | [10]        |         0 |       47 | pass    |       0 | 1.91E-06 |
+| 00005-aten | aten.linspace.default | torch.float | [26]        |         0 |       47 | pass    |       0 |        0 |
+| 00027-aten | aten.add.Tensor       | torch.int64 | [10, 26]    |         0 |        0 | pass    |       0 |        0 |
+| 00028-aten | aten.add.Tensor       | torch.int64 | [10, 26]    |         0 |        0 | pass    |       0 |        0 |
+| 00037-aten | aten.add.Tensor       | torch.float | [260, 1152] | -29.09375 |    33.75 | pass    |       0 |        0 |
+| 00038-aten | aten.add.Tensor       | torch.float | [260, 1152] | -27.1875  |   37.625 | pass    |       0 |        0 |
+| 00047-aten | aten.add.Tensor       | torch.float | [260, 1152] | -28.98438 | 42.34375 | pass    |       0 |        0 |
+| 00082-aten | aten.sub.Tensor       | torch.int32 | [1]         |         0 |        0 | pass    |       0 |        0 |
+------------+-----------------------+-------------+-------------+-----------+----------+---------+---------+----------+
+```
+
+The main focus is on the values of gpu_1e-1, xpu_1e-1, etc., which represent the number of elements whose error between the gpu/xpu result and the cpu result exceeds the order of 1e-n. This serves as the primary basis for determining whether there is a problem with the operator's precision.
+
+#### Replenish
+
+##### Bypassing the issue of differing naming conventions between Kunlun Card and GPU modules, which prevents diff calculation.
+
+```bash
+#
+blocks.0.mlp.linear_fc1.ColumnParallelLinear.0.forward_params.bias
+#
+blocks.0.mlp.linear_fc1.ColumnParalleLinear.forward_params.bias
+```
+
+As shown in the figure above, due to various reasons, the module names dumped by the GPU and XPU are often different, and the compare command cannot be used to identify them directly.
+
+```python
+for step in steps: # (['/'] for group creation order h5py >= 3.10.0)
+    # for bench_key, target_key in get_matched_names(
+    #     list(dump_ben[str(step)].keys()),
+    #     list(dump_tar[str(step)].keys()),
+    #     fuzzy_match,
+    # ):
+    for bench_key, target_key in zip(
+        list(dump_ben[str(step)].keys()),
+        list(dump_tar[str(step)].keys()),
+):
+```
+
+Modify torch_xray/compare/compare.py to skip the get_matched_name step. This modification will allow for line-by-line comparison even if module names differ, producing a compare result. However, it's crucial to ensure that the number of rows in the GPU and XPU dumps is consistent.
--- a/docs/source/developer_guide/evaluation/accuracy/accuracy_server.md
+++ b/docs/source/developer_guide/evaluation/accuracy/accuracy_server.md
@@ -0,0 +1,240 @@
+## Overall accuracy test
+
+### EvalScope
+
+#### 1.Download and install
+
+EvalScope supports use in Python environments. Users can install EvalScope via pip or from source code. Here are examples of both installation methods:
+
+```bash
+#pip
+pip install evalscope[perf] -U
+#git
+git clone https://github.com/modelscope/evalscope.git
+cd evalscope
+pip install -e '.[perf]'
+```
+
+#### 2.Dataset preparation script
+
+```python
+from evalscope.collections import CollectionSchema, DatasetInfo, WeightedSampler
+from evalscope.utils.io_utils import dump_jsonl_data
+import os  # Step 1: Import the os module
+
+schema = CollectionSchema(
+    name="VL-Test",
+    datasets=[
+        CollectionSchema(
+            name="PureText",
+            weight=1,
+            datasets=[
+                DatasetInfo(
+                    name="mmlu_pro",
+                    weight=1,
+                    task_type="exam",
+                    tags=["en"],
+                    args={"few_shot_num": 0},
+                ),
+                DatasetInfo(
+                    name="ifeval",
+                    weight=1,
+                    task_type="instruction",
+                    tags=["en"],
+                    args={"few_shot_num": 0},
+                ),
+                DatasetInfo(
+                    name="gsm8k",
+                    weight=1,
+                    task_type="math",
+                    tags=["en"],
+                    args={"few_shot_num": 0},
+                ),
+            ],
+        ),
+        CollectionSchema(
+            name="Vision",
+            weight=2,
+            datasets=[
+                DatasetInfo(
+                    name="math_vista",
+                    weight=1,
+                    task_type="math",
+                    tags=["en"],
+                    args={"few_shot_num": 0},
+                ),
+                DatasetInfo(
+                    name="mmmu_pro",
+                    weight=1,
+                    task_type="exam",
+                    tags=["en"],
+                    args={"few_shot_num": 0},
+                ),
+            ],
+        ),
+    ],
+)
+
+
+# get the mixed data
+mixed_data = WeightedSampler(schema).sample(1000)
+
+output_path = "outputs/vl_test.jsonl"  # Step 2: Define the output file path
+output_dir = os.path.dirname(output_path)  # Step 3: Obtain the directory name
+if not os.path.exists(output_dir):  # Step 4: Check if the directory exists
+    os.makedirs(output_dir, exist_ok=True)  # Step 5: Automatically create directories
+
+
+# dump the mixed data to a jsonl file
+dump_jsonl_data(mixed_data, output_path)  # Step 6: Securely write to the file
+```
+
+Dataset composition visualization:
+
+```
+┌───────────────────────────────────────┐
+│       VL-Test (1000 samples)          │
+├─────────────────┬─────────────────────┤
+│   PureText      │      Vision         │
+│   (333 samples) │    (667 samples)    │
+├─────────────────┼─────────────────────┤
+│ • mmlu_pro      │ • math_vista        │
+│ • ifeval        │ • mmmu_pro          │
+│ • gsm8k         │                     │
+└─────────────────┴─────────────────────┘
+```
+
+#### 3.Test
+
+```python
+from dotenv import dotenv_values
+
+from evalscope import TaskConfig, run_task
+from evalscope.constants import EvalType
+
+task_cfg = TaskConfig(
+    model="Qwen2.5-VL-7B-Instruct",
+    api_url="http://localhost:8804/v1",
+    api_key="EMPTY",
+    eval_type=EvalType.SERVICE,
+    datasets=[
+        "data_collection",
+    ],
+    dataset_args={
+        "data_collection": {
+            "local_path": "../outputs/vl_test.jsonl",
+        }
+    },
+    eval_batch_size=5,
+    generation_config={
+        "max_tokens": 30000,  # The maximum number of tokens that can be generated should be set to a large value to avoid output truncation.
+        "temperature": 0.6,  # Sampling temperature (recommended value from qwen report)
+        "top_p": 0.95,  # top-p sampling (recommended value from qwen report)
+        "top_k": 20,  # Top-k sampling (recommended value from qwen report)
+        "n": 1,  # Number of responses generated per request
+        "repetition_penalty": 1.0,  # 1.0 = Penalty disabled, >1.0 = Penalty repeated.
+    },
+)
+
+run_task(task_cfg=task_cfg)
+```
+
+Parameter Tuning Guide:
+
+| Parameter         | Current value | Effect                                   | Adjustment suggestions                                   |
+| ----------------- | ------------- | ---------------------------------------- | -------------------------------------------------------- |
+| `temperature`     | 0.6           | Control output diversity                 | Math problems ↓ 0.3 / Creative writing ↑ 0.9             |
+| `top_p`           | 0.95          | Filtering low-probability tokens         | Reduce "nonsense"                                        |
+| `eval_batch_size` | 5             | Number of requests processed in parallel | With sufficient video memory, it can be increased to 10. |
+
+Run the test:
+
+```bash
+#!/bin/bash
+# ========================================
+# Step 1: Set the log file path
+# ========================================
+LOG_FILE="accuracy_$(date +%Y%m%d_%H%M).log"
+
+# ========================================
+# Step 2: Execute the Python script and capture all output
+# Meaning of 2>&1:
+# - 2 represents standard error output (stderr)
+# ->& represents redirection and merging
+# - 1 represents standard output (stdout)
+# Function: Merges error messages into standard output as well.
+# ========================================
+python accuracy.py 2>&1 | tee "$LOG_FILE"
+
+# ========================================
+# Step 3: Check Execution Status
+# ${PIPESTATUS[0]} Get the exit code of the first command (Python) in the pipeline
+# ========================================
+EXIT_CODE=${PIPESTATUS[0]}
+if [ $EXIT_CODE -eq 0 ]; then
+    echo "✅ Evaluation completed! Log saved to: $LOG_FILE"
+else
+    echo "❌ Evaluation failed! Exit code: $EXIT_CODE Please check the log: $LOG_FILE"
+fi
+```
+
+#### 4.Common problem fixes
+
+##### 4.1 NLTK resource missing fix
+
+```bash
+Resource punkt_tab not found.
+```
+
+Solution：
+
+```python
+import nltk
+import os
+
+# Step 1: Set the download path (select a writable directory)
+download_dir = "/workspace/myenv/nltk_data"
+os.makedirs(download_dir, exist_ok=True)
+
+# Step 2: Configure NLTK data path
+nltk.data.path.append(download_dir)
+
+# Step 3: Download necessary resources
+print("🔽 Start downloading punkt_tab resource...")
+try:
+    nltk.download("punkt_tab", download_dir=download_dir)
+    print("✅ Download successful!")
+except Exception as e:
+    print(f"❌ Download failed: {e}")
+    print("💡 Alternative: Download manually from GitHub")
+    print(
+        "   URL: https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt_tab.zip"
+    )
+```
+
+repair:
+
+```bash
+# Activate environment
+source /workspace/myenv/bin/activate
+
+# Run the repair script
+python fix_nltk.py
+
+# Rerun the test
+bash run_accuracy_test.sh
+```
+
+#### 5.Results Display
+
+```bash
+-------------+---------------------+--------------+---------------+-------+
+|  task_type  |       metric        | dataset_name | average_score | count |
+-------------+---------------------+--------------+---------------+-------+
+|    exam     |         acc         |   mmmu_pro   |     0.521     |  334  |
+|    math     |         acc         |  math_vista  |    0.6066     |  333  |
+|    exam     |         acc         |   mmlu_pro   |    0.5405     |  111  |
+| instruction | prompt_level_strict |    ifeval    |    0.6937     |  111  |
+|    math     |         acc         |    gsm8k     |    0.8288     |  111  |
+-------------+---------------------+--------------+---------------+-------+
+```
--- a/docs/source/developer_guide/evaluation/accuracy/index.md
+++ b/docs/source/developer_guide/evaluation/accuracy/index.md
@@ -0,0 +1,10 @@
+# Accuracy
+
+This document details the accuracy testing methods for vllm-kunlun and the analysis of the results.
+
+:::{toctree}
+:caption: Accuracy
+:maxdepth: 1
+accuracy_server
+accuracy_kernel
+:::
--- a/docs/source/developer_guide/evaluation/accuracy_report/GLM-4.5-Air.md
+++ b/docs/source/developer_guide/evaluation/accuracy_report/GLM-4.5-Air.md
@@ -0,0 +1,18 @@
+# GLM-Air-4.5
+
+* vLLM Version: vLLM: 0.10.1.1 , vLLM-KunLun Version: v0.10.1.1
+* Software Environment:OS: Ubuntu 22.04, PyTorch ≥ 2.5.1
+* Hardware Environment: KunLun P800
+* Parallel mode:TP8
+
+```bash
+-------------+----------+---------------+---------+-----+--------+---------+
+| Model       | Dataset  | Metric        | Subset  | Num | Score  | Cat.0   |
+-------------+----------+---------------+---------+-----+--------+---------+
+| GLM-4.5-Air | math_500 | AveragePass@1 | Level 1 | 43  | 0.9302 | default |
+| GLM-4.5-Air | math_500 | AveragePass@1 | Level 2 | 90  | 0.9222 | default |
+| GLM-4.5-Air | math_500 | AveragePass@1 | Level 3 | 105 | 0.8762 | default |
+| GLM-4.5-Air | math_500 | AveragePass@1 | Level 4 | 128 | 0.8984 | default |
+| GLM-4.5-Air | math_500 | AveragePass@1 | Level 5 | 134 | 0.8955 | default |
+-------------+----------+---------------+---------+-----+--------+---------+
+```
--- a/docs/source/developer_guide/evaluation/accuracy_report/GLM-4.5.md
+++ b/docs/source/developer_guide/evaluation/accuracy_report/GLM-4.5.md
@@ -0,0 +1,18 @@
+# GLM-4.5
+
+* vLLM Version: vLLM: 0.10.1.1 , vLLM-KunLun Version: v0.10.1.1
+* Software Environment:OS: Ubuntu 22.04, PyTorch ≥ 2.5.1
+* Hardware Environment: KunLun P800
+* Parallel mode:TP8
+
+```bash
+---------+----------+---------------+---------+-----+--------+---------+
+| Model   | Dataset  | Metric        | Subset  | Num | Score  | Cat.0   |
+---------+----------+---------------+---------+-----+--------+---------+
+| GLM-4.5 | math_500 | AveragePass@1 | Level 1 |  43 | 0.9302 | default |
+| GLM-4.5 | math_500 | AveragePass@1 | Level 2 |  90 | 0.8111 | default |
+| GLM-4.5 | math_500 | AveragePass@1 | Level 3 | 105 | 0.7143 | default |
+| GLM-4.5 | math_500 | AveragePass@1 | Level 4 | 128 | 0.6172 | default |
+| GLM-4.5 | math_500 | AveragePass@1 | Level 5 | 134 | 0.5149 | default |
+---------+----------+---------------+---------+-----+--------+---------+
+```
--- a/docs/source/developer_guide/evaluation/accuracy_report/InternVL3_5-30B-A3B.md
+++ b/docs/source/developer_guide/evaluation/accuracy_report/InternVL3_5-30B-A3B.md
@@ -0,0 +1,18 @@
+# InternVL3_5-30B-A3B
+
+* vLLM Version: vLLM: 0.10.1.1 , vLLM-KunLun Version: v0.10.1.1
+* Software Environment:OS: Ubuntu 22.04, PyTorch ≥ 2.5.1
+* Hardware Environment: KunLun P800
+* Parallel mode:TP8
+
+```
+-------------+---------------------+--------------+---------------+-------+
+|  task_type  |       metric        | dataset_name | average_score | count |
+-------------+---------------------+--------------+---------------+-------+
+|    exam     |         acc         |   mmmu_pro   |    0.5449     |  334  |
+|    math     |         acc         |  math_vista  |    0.6847     |  333  |
+|    exam     |         acc         |   mmlu_pro   |    0.6126     |  111  |
+| instruction | prompt_level_strict |    ifeval    |    0.7658     |  111  |
+|    math     |         acc         |    gsm8k     |    0.9369     |  111  |
+-------------+---------------------+--------------+---------------+-------+
+```
--- a/docs/source/developer_guide/evaluation/accuracy_report/Qwen2.5-VL-7B-Instruct.md
+++ b/docs/source/developer_guide/evaluation/accuracy_report/Qwen2.5-VL-7B-Instruct.md
@@ -0,0 +1,18 @@
+# Qwen2.5-VL-7B-Instruct
+
+* vLLM Version: vLLM: 0.10.1.1 , vLLM-KunLun Version: v0.10.1.1
+* Software Environment:OS: Ubuntu 22.04, PyTorch ≥ 2.5.1
+* Hardware Environment: KunLun P800
+* Parallel mode:TP1
+
+```
+-------------+---------------------+--------------+---------------+-------+
+|  task_type  |       metric        | dataset_name | average_score | count |
+-------------+---------------------+--------------+---------------+-------+
+|    exam     |         acc         |   mmmu_pro   |     0.521     |  334  |
+|    math     |         acc         |  math_vista  |    0.6066     |  333  |
+|    exam     |         acc         |   mmlu_pro   |    0.5405     |  111  |
+| instruction | prompt_level_strict |    ifeval    |    0.6937     |  111  |
+|    math     |         acc         |    gsm8k     |    0.8288     |  111  |
+-------------+---------------------+--------------+---------------+-------+
+```
--- a/docs/source/developer_guide/evaluation/accuracy_report/index.md
+++ b/docs/source/developer_guide/evaluation/accuracy_report/index.md
@@ -0,0 +1,10 @@
+# Accuracy Report
+
+:::{toctree}
+:caption: Accuracy Report
+:maxdepth: 1
+Qwen2.5-VL-7B-Instruct
+InternVL3_5-30B-A3B
+GLM-4.5
+GLM-4.5-Air
+:::
--- a/docs/source/developer_guide/evaluation/index.md
+++ b/docs/source/developer_guide/evaluation/index.md
@@ -0,0 +1,8 @@
+# Accuracy
+
+:::{toctree}
+:caption: Accuracy
+:maxdepth: 1
+accuracy/index
+accuracy_report/index
+:::