diff --git a/docs/source/developer_guide/performance/index.md b/docs/source/developer_guide/performance_and_debug/index.md similarity index 64% rename from docs/source/developer_guide/performance/index.md rename to docs/source/developer_guide/performance_and_debug/index.md index faa51b37..e0eec55d 100644 --- a/docs/source/developer_guide/performance/index.md +++ b/docs/source/developer_guide/performance_and_debug/index.md @@ -1,10 +1,11 @@ -# Performance +# Performance and Debug ::::{toctree} -:caption: Performance +:caption: Performance and Debug :maxdepth: 1 performance_benchmark profile_execute_duration optimization_and_tuning service_profiling_guide +msprobe_guide :::: diff --git a/docs/source/developer_guide/performance_and_debug/msprobe_guide.md b/docs/source/developer_guide/performance_and_debug/msprobe_guide.md new file mode 100644 index 00000000..456809ba --- /dev/null +++ b/docs/source/developer_guide/performance_and_debug/msprobe_guide.md @@ -0,0 +1,516 @@ +# MSProbe Debugging Guide + +During inference or training runs we often encounter accuracy anomalies such as outputs drifting away from the expectation, unstable numerical behavior (NaN/Inf), or predictions that no longer match the labels. To pinpoint the root cause we have to monitor and capture intermediate data produced while the model executes—feature maps, weights, activations, and layer outputs. By capturing key tensors at specific stages, logging I/O pairs for the core layers, and retaining contextual metadata (prompts, tensor dtypes, hardware configuration, etc.), we can systematically trace where the accuracy degradation or numerical error started. This guide describes the end-to-end workflow for diagnosing accuracy issues for AI models (with a focus on vllm-ascend services): preparation, data capture, and analysis & verification. + +## 0. Background Concepts + +`msprobe` supports three accuracy levels: + +- **L0**: dumps tensors at the module level and generates `construct.json` so that visualization tools can rebuild the network structure. A model or submodule handle must be passed in. +- **L1**: collects operator-level statistics only, which is suitable for lightweight troubleshooting. +- **mix**: captures both structural information and operator statistics, which is useful when you need both graph reconstruction and numerical comparisons. + +## 1. Prerequisites + +### 1.1 Install `msprobe` + +Install msprobe with pip: + +```bash +pip install mindstudio-probe==8.3.0 +``` + +### 1.2 Visualization dependencies (optional) + +Install additional dependencies if you need to visualize the captured data. + +1. Install `tb_graph_ascend`: + + ```bash + pip install tb_graph_ascend + ``` + +## 2. Collecting Data with `msprobe` + +We generally follow a coarse-to-fine strategy when capturing data. First identify the token where the issue shows up, and then decide which range needs to be sampled around that token. The typical workflow is described below. + +### 2.1 Prepare the dump configuration file + +Create a `config.json` that can be parsed by `PrecisionDebugger` and place it in an accessible path. Common fields are: + +| Field | Description | Required | +|:---:|:----|:---:| +| `task` | Type of dump task. Common PyTorch values include `"statistics"` and `"tensor"`. A statistics task collects tensor statistics (mean, variance, max, min, etc.) while a tensor task captures arbitrary tensors. | Yes | +| `dump_path` | Directory where dump results are stored. When omitted, `msprobe` uses its default path. | No | +| `rank` | Ranks to sample. An empty list collects every rank. For single-card tasks you must set this field to `[]`. | No | +| `step` | Token iteration(s) to sample. An empty list means every iteration. | No | +| `level` | Dump level string (`"L0"`, `"L1"`, or `"mix"`). `L0` targets `nn.Module`, `L1` targets `torch.api`, and `mix` collects both. | Yes | +| `async_dump` | Whether to enable asynchronous dump (supported for PyTorch `statistics`/`tensor` tasks). Defaults to `false`. | No | +| `scope` | Module range to sample. An empty list collects every module. | No | +| `list` | Operator range to sample. An empty list collects every operator. | No | + +To restrict the operators that are captured, configure the `list` block: + +- `scope` (list[str]): In PyTorch pynative scenarios this field restricts the dump range. Provide two module or API names that follow the tool's naming convention to lock a range; only data between the two names will be dumped. Examples: + + ``` + "scope": ["Module.conv1.Conv2d.forward.0", "Module.fc2.Linear.forward.0"] + "scope": ["Cell.conv1.Conv2d.forward.0", "Cell.fc2.Dense.backward.0"] + "scope": ["Tensor.add.0.forward", "Functional.square.2.forward"] + ``` + + The `level` setting determines what can be provided—modules when `level=L0`, APIs when `level=L1`, and either modules or APIs when `level=mix`. + +- `list` (list[str]): Custom operator list. Options include: + - Supply the full names of specific APIs in PyTorch pynative scenarios to only dump those APIs. Example: `"list": ["Tensor.permute.1.forward", "Tensor.transpose.2.forward", "Torch.relu.3.backward"]`. + - When `level=mix`, you can provide module names so that the dump expands to everything produced while the module is running. Example: `"list": ["Module.module.language_model.encoder.layers.0.mlp.ParallelMlp.forward.0"]`. + - Provide a substring such as `"list": ["relu"]` to dump every API whose name contains the substring. When `level=mix`, modules whose names contain the substring are also expanded. + +Example configuration: + +```bash +cat <<'JSON' > /data/msprobe_config.json +{ + "task": "statistics", + "dump_path": "/home/data_dump", + "rank": [], + "step": [], + "level": "L1", + "async_dump": false, + + "statistics": { + "scope": [], + "list": [], + "tensor_list": [], + "data_mode": ["all"], + "summary_mode": "statistics" + } +} +JSON +``` + +## 2. Enable `msprobe` in vllm-ascend + +1. Start vLLM in eager mode by adding `--enforce-eager` (static-graph scenarios are not supported yet) and pass the config path through `--additional-config`: + + ```bash + vllm serve Qwen/Qwen2.5-0.5B-Instruct \ + --dtype float16 \ + --enforce-eager \ + --host 0.0.0.0 \ + --port 8000 \ + --additional-config '{"dump_config": "/data/msprobe_config.json"}' & + ``` + +## 3. Send requests and collect dumps + +1. Send inference requests as usual, for example: + + ```bash + curl http://localhost:8000/v1/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "Qwen/Qwen2.5-0.5B-Instruct", + "prompt": "Explain gravity in one sentence.", + "max_tokens": 32, + "temperature": 0 + }' | python -m json.tool + ``` + +2. Each request drives the sequence `msprobe: start -> forward/backward -> stop -> step`. The runner invokes `step()` on every code path, so you always get a complete dataset even if inference returns early. + +3. Dump files are written into `dump_path`. They usually contain: + - Tensor files grouped by operator/module. + - `dump.json`, which records metadata such as dtype, shape, min/max, and `requires_grad`. + - `construct.json`, which is generated when `level` is `L0` or `mix` (required for visualization). + + Example directory layout: + + ```text + ├── dump_path + │ ├── step0 + │ │ ├── rank0 + │ │ │ ├── dump_tensor_data + │ │ │ │ ├── Tensor.permute.1.forward.pt + │ │ │ │ ├── Functional.linear.5.backward.output.pt # Format: {api_type}.{api_name}.{call_count}.{forward/backward}.{input/output}.{arg_index}. + │ │ │ │ │ # arg_index is the nth input or output of the API. If an input is a list, keep numbering with decimals (e.g., 1.1 is the first element of the first argument). + │ │ │ │ ├── Module.conv1.Conv2d.forward.0.input.0.pt # Format: {Module}.{module_name}.{class_name}.{forward/backward}.{call_count}.{input/output}.{arg_index}. + │ │ │ │ ├── Module.conv1.Conv2d.forward.0.parameters.bias.pt # Module parameter data: {Module}.{module_name}.{class_name}.forward.{call_count}.parameters.{parameter_name}. + │ │ │ │ └── Module.conv1.Conv2d.parameters_grad.weight.pt # Module parameter gradients: {Module}.{module_name}.{class_name}.parameters_grad.{parameter_name}. Gradients do not include call_count because the same gradient updates all invocations. + │ │ │ │ # When the `model` argument passed to dump is a List[torch.nn.Module] or Tuple[torch.nn.Module], module-level data names also include the index inside the list ({Module}.{index}.*), e.g., Module.0.conv1.Conv2d.forward.0.input.0.pt. + │ │ │ ├── dump.json + │ │ │ ├── stack.json + │ │ │ ├── dump_error_info.log + │ │ │ └── construct.json + │ │ ├── rank1 + │ │ │ ├── dump_tensor_data + │ │ │ │ └── ... + │ │ │ ├── dump.json + │ │ │ ├── stack.json + │ │ │ ├── dump_error_info.log + │ │ │ └── construct.json + │ │ ├── ... + │ │ │ + │ │ └── rank7 + │ ├── step1 + │ │ ├── ... + │ ├── step2 + ``` + + - `rank`: Device ID. Each card writes its data to the corresponding `rank{ID}` directory. In non-distributed scenarios the directory is simply named `rank`. + - `dump_tensor_data`: Tensor payloads that were collected. + - `dump.json`: Statistics for the forward/backward data of each API or module, including names, dtype, shape, max, min, mean, L2 norm (square root of the L2 variance), and CRC-32 when `summary_mode="md5"`. See [dump.json file description](#dumpjson-file-description) for details. + - `dump_error_info.log`: Present only when the dump tool encountered an error and records the failure log. + - `stack.json`: Call stacks for APIs/modules. + - `construct.json`: Hierarchical structure description. Empty when `level=L1`. + +## 4. Analyze the results + +### 4.1 Prerequisites + +You typically need two dump datasets: one from the "problem side" (the run that exposes the accuracy or numerical error) and another from the "benchmark side" (a good baseline). These datasets do not have to be identical—they can come from different branches, framework versions, or even alternative implementations (operator substitutions, different graph-optimization switches, etc.). As long as they use the same or similar inputs, hardware topology, and sampling points (step/token), `msprobe` can compare them and locate the divergent nodes. If you cannot find a perfectly clean benchmark, start by capturing the problem-side data, craft the smallest reproducible case by hand, and perform a self-comparison. Below we assume the problem dump is `problem_dump` and the benchmark dump is `bench_dump`. + +### 4.2 Visualization + +Use `msprobe graph_visualize` to generate results that can be opened inside `tb_graph_ascend`. + +1. Ensure the dump contains `construct.json` (i.e., `level = L0` or `level = mix`). +2. Prepare a comparison file such as `compare.json`. Its format and generation flow are described in section 3.1.3 of `msprobe_visualization.md`. Example (minimal runnable snippet): + + ```json + { + "npu_path": "./problem_dump", + "bench_path": "./bench_dump", + "is_print_compare_log": true + } + ``` + + Replace the paths with your dump directories before invoking `msprobe graph_visualize`. **If you only need to build a single graph**, omit `bench_path` to visualize one dump. + Multi-rank scenarios (single rank, multi-rank, or multi-step multi-rank) are also supported. `npu_path` or `bench_path` must contain folders named `rank+number`, and every rank folder must contain a non-empty `construct.json` together with `dump.json` and `stack.json`. If any `construct.json` is empty, verify that the dump level includes `L0` or `mix`. When comparing graphs, both `npu_path` and `bench_path` must contain the same set of rank folders so they can be paired one-to-one. + + ``` + ├── npu_path or bench_path + | ├── rank0 + | | ├── dump_tensor_data (only when the `tensor` option is enabled) + | | | ├── Tensor.permute.1.forward.pt + | | | ├── MyModule.0.forward.input.pt + | | | ... + | | | └── Function.linear.5.backward.output.pt + | | ├── dump.json # Tensor metadata + | | ├── stack.json # Operator call stack information + | | └── construct.json # Hierarchical structure; empty when `level=L1` + | ├── rank1 + | | ├── dump_tensor_data + | | | └── ... + | | ├── dump.json + | | ├── stack.json + | | └── construct.json + | ├── ... + | | + | └── rankn + ``` + +3. Run: + + ```bash + msprobe graph_visualize \ + --input_path ./compare.json \ + --output_path ./graph_output + ``` + + After the comparison finishes, a `*.vis.db` file is created under `graph_output`. + + - Graph build: `build_{timestamp}.vis.db` + - Graph comparison: `compare_{timestamp}.vis.db` + +4. Launch `tensorboard` and load the output directory to inspect structural differences, numerical comparisons, overflow detection results, cross-device communication nodes, and filters/search. Pass the directory containing the `.vis.db` files to `--logdir`: + + ```bash + tensorboard --logdir out_path --bind_all --port [optional_port] + ``` + +5. Inspect the visualization. The UI usually displays the overall model structure with operators, parameters, and tensor I/O. Click any node to expand its children. + - **Difference visualization**: Comparison results highlight divergent nodes with different colors (the larger the difference, the redder the node). Click a node to view its detailed information including tensor inputs/outputs, parameters, and operator type. Analyze the data difference and the surrounding connections to pinpoint the exact divergence. + - **Helper features**: + - Switch rank/step: Quickly check difference nodes on different ranks and steps. + - Search/filter: Use the search box to filter nodes by operator name, etc. + - Manual mapping: Automatic mapping cannot cover every case, so the tool lets you manually map nodes between the problem and benchmark graphs before generating comparison results. + +## 5. Troubleshooting + +- `RuntimeError: Please enforce eager mode`: Restart vLLM and add the `--enforce-eager` flag. +- No dump files: Confirm that the JSON path is correct and every node has write permission. In distributed scenarios set `keep_all_ranks` so that every rank writes its own dump. +- Dumps are too large: Start with a `statistics` task to locate abnormal tensors, then narrow the scope with `scope`/`list`/`tensor_list`, `filters`, `token_range`, etc. + +--- + +## Appendix + +### dump.json file description + +#### L0 level + +An L0 `dump.json` contains forward/backward I/O for modules together with parameters and parameter gradients. Using PyTorch's `Conv2d` as an example, the network code looks like: + +`output = self.conv2(input) # self.conv2 = torch.nn.Conv2d(64, 128, 5, padding=2, bias=True)` + +`dump.json` contains the following entries: + +- `Module.conv2.Conv2d.forward.0`: Forward data of the module. `input_args` represents positional inputs, `input_kwargs` represents keyword inputs, `output` stores forward outputs, and `parameters` stores weights/biases. +- `Module.conv2.Conv2d.parameters_grad`: Parameter gradients (weight and bias). +- `Module.conv2.Conv2d.backward.0`: Backward data of the module. `input` represents gradients that flow into the module (gradients of the forward outputs) and `output` represents gradients that flow out (gradients of the module inputs). + +**Note**: When the `model` parameter passed to the dump API is `List[torch.nn.Module]` or `Tuple[torch.nn.Module]`, module-level names include the index inside the list (`{Module}.{index}.*`). Example: `Module.0.conv1.Conv2d.forward.0`. + +```json +{ + "task": "tensor", + "level": "L0", + "framework": "pytorch", + "dump_data_dir": "/dump/path", + "data": { + "Module.conv2.Conv2d.forward.0": { + "input_args": [ + { + "type": "torch.Tensor", + "dtype": "torch.float32", + "shape": [ + 8, + 16, + 14, + 14 + ], + "Max": 1.638758659362793, + "Min": 0.0, + "Mean": 0.2544615864753723, + "Norm": 70.50277709960938, + "requires_grad": true, + "data_name": "Module.conv2.Conv2d.forward.0.input.0.pt" + } + ], + "input_kwargs": {}, + "output": [ + { + "type": "torch.Tensor", + "dtype": "torch.float32", + "shape": [ + 8, + 32, + 10, + 10 + ], + "Max": 1.6815717220306396, + "Min": -1.5120246410369873, + "Mean": -0.025344856083393097, + "Norm": 149.65576171875, + "requires_grad": true, + "data_name": "Module.conv2.Conv2d.forward.0.output.0.pt" + } + ], + "parameters": { + "weight": { + "type": "torch.Tensor", + "dtype": "torch.float32", + "shape": [ + 32, + 16, + 5, + 5 + ], + "Max": 0.05992485210299492, + "Min": -0.05999220535159111, + "Mean": -0.0006165213999338448, + "Norm": 3.421217441558838, + "requires_grad": true, + "data_name": "Module.conv2.Conv2d.forward.0.parameters.weight.pt" + }, + "bias": { + "type": "torch.Tensor", + "dtype": "torch.float32", + "shape": [ + 32 + ], + "Max": 0.05744686722755432, + "Min": -0.04894155263900757, + "Mean": 0.006410328671336174, + "Norm": 0.17263513803482056, + "requires_grad": true, + "data_name": "Module.conv2.Conv2d.forward.0.parameters.bias.pt" + } + } + }, + "Module.conv2.Conv2d.parameters_grad": { + "weight": [ + { + "type": "torch.Tensor", + "dtype": "torch.float32", + "shape": [ + 32, + 16, + 5, + 5 + ], + "Max": 0.018550323322415352, + "Min": -0.008627401664853096, + "Mean": 0.0006675920449197292, + "Norm": 0.26084786653518677, + "requires_grad": false, + "data_name": "Module.conv2.Conv2d.parameters_grad.weight.pt" + } + ], + "bias": [ + { + "type": "torch.Tensor", + "dtype": "torch.float32", + "shape": [ + 32 + ], + "Max": 0.014914230443537235, + "Min": -0.006656786892563105, + "Mean": 0.002657240955159068, + "Norm": 0.029451673850417137, + "requires_grad": false, + "data_name": "Module.conv2.Conv2d.parameters_grad.bias.pt" + } + ] + }, + "Module.conv2.Conv2d.backward.0": { + "input": [ + { + "type": "torch.Tensor", + "dtype": "torch.float32", + "shape": [ + 8, + 32, + 10, + 10 + ], + "Max": 0.0015069986693561077, + "Min": -0.001139344065450132, + "Mean": 3.3215508210560074e-06, + "Norm": 0.020567523315548897, + "requires_grad": false, + "data_name": "Module.conv2.Conv2d.backward.0.input.0.pt" + } + ], + "output": [ + { + "type": "torch.Tensor", + "dtype": "torch.float32", + "shape": [ + 8, + 16, + 14, + 14 + ], + "Max": 0.0007466732058674097, + "Min": -0.00044813455315306783, + "Mean": 6.814070275140693e-06, + "Norm": 0.01474067009985447, + "requires_grad": false, + "data_name": "Module.conv2.Conv2d.backward.0.output.0.pt" + } + ] + } + } +} +``` + +#### L1 level + +An L1 `dump.json` records forward/backward I/O for APIs. Using PyTorch's `relu` function as an example (`output = torch.nn.functional.relu(input)`), the file contains: + +- `Functional.relu.0.forward`: Forward data of the API. `input_args` are positional inputs, `input_kwargs` are keyword inputs, and `output` stores the forward outputs. +- `Functional.relu.0.backward`: Backward data of the API. `input` represents the gradients of the forward outputs, and `output` represents the gradients that flow back to the forward inputs. + +```json +{ + "task": "tensor", + "level": "L1", + "framework": "pytorch", + "dump_data_dir":"/dump/path", + "data": { + "Functional.relu.0.forward": { + "input_args": [ + { + "type": "torch.Tensor", + "dtype": "torch.float32", + "shape": [ + 32, + 16, + 28, + 28 + ], + "Max": 1.3864083290100098, + "Min": -1.3364859819412231, + "Mean": 0.03711778670549393, + "Norm": 236.20692443847656, + "requires_grad": true, + "data_name": "Functional.relu.0.forward.input.0.pt" + } + ], + "input_kwargs": {}, + "output": [ + { + "type": "torch.Tensor", + "dtype": "torch.float32", + "shape": [ + 32, + 16, + 28, + 28 + ], + "Max": 1.3864083290100098, + "Min": 0.0, + "Mean": 0.16849493980407715, + "Norm": 175.23345947265625, + "requires_grad": true, + "data_name": "Functional.relu.0.forward.output.0.pt" + } + ] + }, + "Functional.relu.0.backward": { + "input": [ + { + "type": "torch.Tensor", + "dtype": "torch.float32", + "shape": [ + 32, + 16, + 28, + 28 + ], + "Max": 0.0001815402356442064, + "Min": -0.00013352684618439525, + "Mean": 0.00011915402356442064, + "Norm": 0.007598237134516239, + "requires_grad": false, + "data_name": "Functional.relu.0.backward.input.0.pt" + } + ], + "output": [ + { + "type": "torch.Tensor", + "dtype": "torch.float32", + "shape": [ + 32, + 16, + 28, + 28 + ], + "Max": 0.0001815402356442064, + "Min": -0.00012117840378778055, + "Mean": 2.0098118724831693e-08, + "Norm": 0.006532244384288788, + "requires_grad": false, + "data_name": "Functional.relu.0.backward.output.0.pt" + } + ] + } + } +} +``` + +#### mix level + +A `mix` dump.json contains both L0 and L1 level data; the file format is the same as the examples above. diff --git a/docs/source/developer_guide/performance/optimization_and_tuning.md b/docs/source/developer_guide/performance_and_debug/optimization_and_tuning.md similarity index 100% rename from docs/source/developer_guide/performance/optimization_and_tuning.md rename to docs/source/developer_guide/performance_and_debug/optimization_and_tuning.md diff --git a/docs/source/developer_guide/performance/performance_benchmark.md b/docs/source/developer_guide/performance_and_debug/performance_benchmark.md similarity index 100% rename from docs/source/developer_guide/performance/performance_benchmark.md rename to docs/source/developer_guide/performance_and_debug/performance_benchmark.md diff --git a/docs/source/developer_guide/performance/profile_execute_duration.md b/docs/source/developer_guide/performance_and_debug/profile_execute_duration.md similarity index 100% rename from docs/source/developer_guide/performance/profile_execute_duration.md rename to docs/source/developer_guide/performance_and_debug/profile_execute_duration.md diff --git a/docs/source/developer_guide/performance/service_profiling_guide.md b/docs/source/developer_guide/performance_and_debug/service_profiling_guide.md similarity index 100% rename from docs/source/developer_guide/performance/service_profiling_guide.md rename to docs/source/developer_guide/performance_and_debug/service_profiling_guide.md diff --git a/docs/source/index.md b/docs/source/index.md index 8c087447..cf1c64df 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -56,7 +56,7 @@ user_guide/release_notes developer_guide/contribution/index developer_guide/feature_guide/index developer_guide/evaluation/index -developer_guide/performance/index +developer_guide/performance_and_debug/index ::: % How to involve vLLM Ascend diff --git a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/index.po b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance_and_debug/index.po similarity index 78% rename from docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/index.po rename to docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance_and_debug/index.po index c2b2e6fd..83eaab9e 100644 --- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/index.po +++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance_and_debug/index.po @@ -20,7 +20,7 @@ msgstr "" "Plural-Forms: nplurals=1; plural=0;\n" "Generated-By: Babel 2.17.0\n" -#: ../../developer_guide/performance/index.md:1 -#: ../../developer_guide/performance/index.md:3 -msgid "Performance" -msgstr "性能" +#: ../../developer_guide/performance_and_debug/index.md:1 +#: ../../developer_guide/performance_and_debug/index.md:3 +msgid "Performance and Debug" +msgstr "性能和调试" diff --git a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance_and_debug/msprobe_guide.po b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance_and_debug/msprobe_guide.po new file mode 100644 index 00000000..cd0f32d6 --- /dev/null +++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance_and_debug/msprobe_guide.po @@ -0,0 +1,646 @@ +# SOME DESCRIPTIVE TITLE. +# Copyright (C) 2025, vllm-ascend team +# This file is distributed under the same license as the vllm-ascend +# package. +# FIRST AUTHOR , 2025. +# +msgid "" +msgstr "" +"Project-Id-Version: vllm-ascend\n" +"Report-Msgid-Bugs-To: EMAIL@ADDRESS\n" +"POT-Creation-Date: 2025-11-21 10:19+0800\n" +"PO-Revision-Date: 2025-11-21 10:31\n" +"Last-Translator: Codex \n" +"Language-Team: Chinese (Simplified) \n" +"Language: zh_CN\n" +"MIME-Version: 1.0\n" +"Content-Type: text/plain; charset=utf-8\n" +"Content-Transfer-Encoding: 8bit\n" +"Plural-Forms: nplurals=1; plural=0;\n" +"Generated-By: Babel 2.17.0\n" + +#: ../../developer_guide/performance_and_performance_and_debug/msprobe_guide.md +msgid "MSProbe Debugging Guide" +msgstr "MSProbe 调试指南" + +#: ../../developer_guide/performance_and_performance_and_debug/msprobe_guide.md +msgid "" +"During inference or training runs we often encounter accuracy anomalies such" +" as outputs drifting away from the expectation, unstable numerical behavior " +"(NaN/Inf), or predictions that no longer match the labels. To pinpoint the " +"root cause we have to monitor and capture intermediate data produced while " +"the model executes—feature maps, weights, activations, and layer outputs. By" +" capturing key tensors at specific stages, logging I/O pairs for the core " +"layers, and retaining contextual metadata (prompts, tensor dtypes, hardware " +"configuration, etc.), we can systematically trace where the accuracy " +"degradation or numerical error started. This guide describes the end-to-end " +"workflow for diagnosing accuracy issues for AI models (with a focus on vllm-" +"ascend services): preparation, data capture, and analysis & verification." +msgstr "" +"在推理或训练过程中,我们经常会遇到输出偏离预期、出现 NaN/Inf " +"等数值不稳定现象,或者模型预测与标签不一致等精度异常。要定位根因,就必须监控并采集模型执行过程中的中间数据——例如特征图、权重、激活值及各层输出。通过在关键阶段捕获核心张量、记录核心层的输入输出对,并保留提示词、张量" +" dtype、硬件配置等上下文元数据,我们可以系统追踪精度退化或数值错误的源头。本指南聚焦 vllm-ascend 服务,介绍 AI " +"模型精度问题排查的完整流程:准备、数据采集以及分析与验证。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "0. Background Concepts" +msgstr "0. 前置概念" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "`msprobe` supports three accuracy levels:" +msgstr "`msprobe` 支持三种精度级别:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"**L0**: dumps tensors at the module level and generates `construct.json` so " +"that visualization tools can rebuild the network structure. A model or " +"submodule handle must be passed in." +msgstr "**L0**:在`nn.Module`级别保存`tensor`,并生成 `construct.json` 以便可视化工具还原网络结构,需要传入模型或子模块句柄。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"**L1**: collects operator-level statistics only, which is suitable for " +"lightweight troubleshooting." +msgstr "**L1**:仅采集算子级统计信息,适合轻量排查。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"**mix**: captures both structural information and operator statistics, which" +" is useful when you need both graph reconstruction and numerical " +"comparisons." +msgstr "**mix**:同时获取结构信息与算子统计,适用于既要构图又要进行数值对比的场景。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "1. Prerequisites" +msgstr "1. 前提条件" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "1.1 Install `msprobe`" +msgstr "1.1 安装 `msprobe`" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "Install msprobe with pip:" +msgstr "使用 pip 安装 msprobe:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "1.2 Visualization dependencies (optional)" +msgstr "1.2 可视化依赖(可选)" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Install additional dependencies if you need to visualize the captured data." +msgstr "如需对采集的数据进行可视化,请安装以下依赖。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "Install `tb_graph_ascend`:" +msgstr "安装 `tb_graph_ascend`:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "2. Collecting Data with `msprobe`" +msgstr "2. 使用 `msprobe` 采集数据" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"We generally follow a coarse-to-fine strategy when capturing data. First " +"identify the token where the issue shows up, and then decide which range " +"needs to be sampled around that token. The typical workflow is described " +"below." +msgstr "采集通常遵循由粗到细的策略:先确定问题出现的 token,再围绕该 token 决定采样范围,常规流程如下。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "2.1 Prepare the dump configuration file" +msgstr "2.1 准备 dump 配置文件" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Create a `config.json` that can be parsed by `PrecisionDebugger` and place " +"it in an accessible path. Common fields are:" +msgstr "创建可被 `PrecisionDebugger` 解析的 `config.json` 并放置在可访问路径,常见字段如下:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "Field" +msgstr "字段" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "Description" +msgstr "说明" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "Required" +msgstr "必填" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "`task`" +msgstr "`task`" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Type of dump task. Common PyTorch values include `\"statistics\"` and " +"`\"tensor\"`. A statistics task collects tensor statistics (mean, variance, " +"max, min, etc.) while a tensor task captures arbitrary tensors." +msgstr "" +"dump 任务类型。PyTorch 常见取值包括 `\"statistics\"` 和 `\"tensor\"`:statistics " +"任务采集张量统计量(均值、方差、最大值、最小值等),tensor 任务可采集任意张量。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "Yes" +msgstr "是" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "`dump_path`" +msgstr "`dump_path`" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Directory where dump results are stored. When omitted, `msprobe` uses its " +"default path." +msgstr "dump 结果保存目录,未配置时使用 `msprobe` 默认路径。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "No" +msgstr "否" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "`rank`" +msgstr "`rank`" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Ranks to sample. An empty list collects every rank. For single-card tasks " +"you must set this field to `[]`." +msgstr "指定需要采集的设备 rank,空列表表示全部 rank;单卡任务必须配置为 `[]`。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "`step`" +msgstr "`step`" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "Token iteration(s) to sample. An empty list means every iteration." +msgstr "指定采集的 token 轮次,空列表表示全部迭代。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "`level`" +msgstr "`level`" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Dump level string (`\"L0\"`, `\"L1\"`, or `\"mix\"`). `L0` targets " +"`nn.Module`, `L1` targets `torch.api`, and `mix` collects both." +msgstr "" +"dump 级别字符串(`\"L0\"`、`\"L1\"`、`\"mix\"`),L0 面向 `nn.Module`,L1 面向 " +"`torch.api`,mix 同时采集两者。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "`async_dump`" +msgstr "`async_dump`" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Whether to enable asynchronous dump (supported for PyTorch " +"`statistics`/`tensor` tasks). Defaults to `false`." +msgstr "是否启用异步 dump(PyTorch `statistics`/`tensor` 任务可用),默认 `false`。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "`scope`" +msgstr "`scope`" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "Module range to sample. An empty list collects every module." +msgstr "指定需要采集的模块范围,空列表表示全部模块。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "`list`" +msgstr "`list`" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "Operator range to sample. An empty list collects every operator." +msgstr "指定需要采集的算子范围,空列表表示全部算子。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"To restrict the operators that are captured, configure the `list` block:" +msgstr "如需进一步限定算子范围,请配置 `list`:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"`scope` (list[str]): In PyTorch pynative scenarios this field restricts the " +"dump range. Provide two module or API names that follow the tool's naming " +"convention to lock a range; only data between the two names will be dumped. " +"Examples:" +msgstr "" +"`scope`(list[str]):在 PyTorch 动态图场景下用于限定 dump 区间。按照工具命名格式提供两个模块或 API 名称,只会 " +"dump 这一区间内的数据。示例:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"The `level` setting determines what can be provided—modules when `level=L0`," +" APIs when `level=L1`, and either modules or APIs when `level=mix`." +msgstr "" +"`level` 的取值决定可配置内容:`level=L0` 填模块名,`level=L1` 填 API 名,`level=mix` 则二者皆可。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "`list` (list[str]): Custom operator list. Options include:" +msgstr "`list`(list[str]):用于自定义采集的算子范围,常见方式包括:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Supply the full names of specific APIs in PyTorch pynative scenarios to only" +" dump those APIs. Example: `\"list\": [\"Tensor.permute.1.forward\", " +"\"Tensor.transpose.2.forward\", \"Torch.relu.3.backward\"]`." +msgstr "" +"在 PyTorch 动态图场景中配置 API 全称,仅 dump 这些 API,例如 `\"list\": " +"[\"Tensor.permute.1.forward\", \"Tensor.transpose.2.forward\", " +"\"Torch.relu.3.backward\"]`。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"When `level=mix`, you can provide module names so that the dump expands to " +"everything produced while the module is running. Example: `\"list\": " +"[\"Module.module.language_model.encoder.layers.0.mlp.ParallelMlp.forward.0\"]`." +msgstr "" +"当 `level=mix` 时可以填写模块名称,工具会在该模块执行期间展开并 dump 所有数据,例如 `\"list\": " +"[\"Module.module.language_model.encoder.layers.0.mlp.ParallelMlp.forward.0\"]`。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Provide a substring such as `\"list\": [\"relu\"]` to dump every API whose " +"name contains the substring. When `level=mix`, modules whose names contain " +"the substring are also expanded." +msgstr "" +"也可以仅提供子串(如 `\"list\": [\"relu\"]`),会 dump 名称包含该字符串的 API,且 `level=mix` " +"时会展开名称包含该字符串的模块。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "Example configuration:" +msgstr "示例配置:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "2. Enable `msprobe` in vllm-ascend" +msgstr "2. 在 vllm-ascend 中启用 `msprobe`" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Start vLLM in eager mode by adding `--enforce-eager` (static-graph scenarios" +" are not supported yet) and pass the config path through `--additional-" +"config`:" +msgstr "" +"通过添加 `--enforce-eager` 以 eager 模式启动 vLLM(静态图暂不支持),并通过 `--additional-config` " +"传入配置路径:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "3. Send requests and collect dumps" +msgstr "3. 发送请求并采集 dump" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "Send inference requests as usual, for example:" +msgstr "按常规方式发送推理请求,例如:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Each request drives the sequence `msprobe: start -> forward/backward -> stop" +" -> step`. The runner invokes `step()` on every code path, so you always get" +" a complete dataset even if inference returns early." +msgstr "" +"每个请求都会执行 `msprobe: start -> forward/backward -> stop -> step`,Runner " +"在所有路径都会调用 `step()`,即使推理提前结束也能拿到完整数据。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "Dump files are written into `dump_path`. They usually contain:" +msgstr "dump 文件写入 `dump_path`,通常包含:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "Tensor files grouped by operator/module." +msgstr "按算子或模块划分的张量文件。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"`dump.json`, which records metadata such as dtype, shape, min/max, and " +"`requires_grad`." +msgstr "描述 dtype、shape、最小/最大值以及 `requires_grad` 等信息的 `dump.json`。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"`construct.json`, which is generated when `level` is `L0` or `mix` (required" +" for visualization)." +msgstr "当级别为 `L0` 或 `mix` 时生成的 `construct.json`(可视化必需)。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "Example directory layout:" +msgstr "目录结构示例:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +#, python-brace-format +msgid "" +"`rank`: Device ID. Each card writes its data to the corresponding `rank{ID}`" +" directory. In non-distributed scenarios the directory is simply named " +"`rank`." +msgstr "`rank`:设备 ID。每张卡写入对应的 `rank{ID}` 目录,非分布式场景目录名称为 `rank`。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "`dump_tensor_data`: Tensor payloads that were collected." +msgstr "`dump_tensor_data`:采集到的张量数据。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"`dump.json`: Statistics for the forward/backward data of each API or module," +" including names, dtype, shape, max, min, mean, L2 norm (square root of the " +"L2 variance), and CRC-32 when `summary_mode=\"md5\"`. See [dump.json file " +"description](#dumpjson-file-description) for details." +msgstr "" +"`dump.json`:保存各 API 或模块前/反向数据统计,包含名称、dtype、shape、max、min、mean、L2 " +"norm(平方根)以及在 `summary_mode=\"md5\"` 下的 CRC-32。详见 [dump.json file " +"description](#dumpjson-file-description)。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"`dump_error_info.log`: Present only when the dump tool encountered an error " +"and records the failure log." +msgstr "`dump_error_info.log`:仅在 dump 工具报错时生成,记录错误日志。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "`stack.json`: Call stacks for APIs/modules." +msgstr "`stack.json`:API/Module 的调用栈信息。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"`construct.json`: Hierarchical structure description. Empty when `level=L1`." +msgstr "`construct.json`:分层结构描述,`level=L1` 时为空。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "4. Analyze the results" +msgstr "4. 分析结果" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "4.1 Prerequisites" +msgstr "4.1 前置条件" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"You typically need two dump datasets: one from the \"problem side\" (the run" +" that exposes the accuracy or numerical error) and another from the " +"\"benchmark side\" (a good baseline). These datasets do not have to be " +"identical—they can come from different branches, framework versions, or even" +" alternative implementations (operator substitutions, different graph-" +"optimization switches, etc.). As long as they use the same or similar " +"inputs, hardware topology, and sampling points (step/token), `msprobe` can " +"compare them and locate the divergent nodes. If you cannot find a perfectly " +"clean benchmark, start by capturing the problem-side data, craft the " +"smallest reproducible case by hand, and perform a self-comparison. Below we " +"assume the problem dump is `problem_dump` and the benchmark dump is " +"`bench_dump`." +msgstr "" +"通常需要准备两份 dump " +"数据:一份来自出现精度或数值异常的“问题侧”,另一份来自表现正常的“标杆侧”。两份数据无需完全一致,可以来自不同分支、不同框架版本,甚至不同实现(算子替换、图优化开关差异等)。只要输入、硬件拓扑和采样点(step/token)保持一致或相近,msprobe" +" 就能对比并定位差异节点。若无法找到足够干净的标杆,可先采集问题侧数据,手动构造最小复现用例并进行自对比。下文默认问题侧目录为 " +"`problem_dump`,标杆侧为 `bench_dump`。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "4.2 Visualization" +msgstr "4.2 可视化" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Use `msprobe graph_visualize` to generate results that can be opened inside " +"`tb_graph_ascend`." +msgstr "使用 `msprobe graph_visualize` 生成结果,并在 `tb_graph_ascend` 中查看。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Ensure the dump contains `construct.json` (i.e., `level = L0` or `level = " +"mix`)." +msgstr "确保 dump 中包含 `construct.json`(即 `level=L0` 或 `level=mix`)。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Prepare a comparison file such as `compare.json`. Its format and generation " +"flow are described in section 3.1.3 of `msprobe_visualization.md`. Example " +"(minimal runnable snippet):" +msgstr "" +"准备 `compare.json` 等对比文件,其格式与生成方式见 `msprobe_visualization.md` 3.1.3 节。示例:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Replace the paths with your dump directories before invoking `msprobe " +"graph_visualize`. **If you only need to build a single graph**, omit " +"`bench_path` to visualize one dump. Multi-rank scenarios (single rank, " +"multi-rank, or multi-step multi-rank) are also supported. `npu_path` or " +"`bench_path` must contain folders named `rank+number`, and every rank folder" +" must contain a non-empty `construct.json` together with `dump.json` and " +"`stack.json`. If any `construct.json` is empty, verify that the dump level " +"includes `L0` or `mix`. When comparing graphs, both `npu_path` and " +"`bench_path` must contain the same set of rank folders so they can be paired" +" one-to-one." +msgstr "" +"在执行 `msprobe graph_visualize` 前,将路径替换为实际 dump 目录。**若只需构建单图**,可省略 " +"`bench_path`。单 rank、多 rank 以及多 step 多 rank 场景均受支持:`npu_path` 或 `bench_path` " +"下必须只有名为 `rank+数字` 的文件夹,并且每个 rank 目录都包含非空的 `construct.json`、`dump.json` 与 " +"`stack.json`。若某个 `construct.json` 为空,请确认 dump 级别包含 L0 或 mix。做图比较时,两侧的 rank " +"目录数量和名称必须一一对应。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "Run:" +msgstr "执行:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"After the comparison finishes, a `*.vis.db` file is created under " +"`graph_output`." +msgstr "对比完成后会在 `graph_output` 下生成 `*.vis.db` 文件。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +#, python-brace-format +msgid "Graph build: `build_{timestamp}.vis.db`" +msgstr "图构建:`build_{timestamp}.vis.db`" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +#, python-brace-format +msgid "Graph comparison: `compare_{timestamp}.vis.db`" +msgstr "图对比:`compare_{timestamp}.vis.db`" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Launch `tensorboard` and load the output directory to inspect structural " +"differences, numerical comparisons, overflow detection results, cross-device" +" communication nodes, and filters/search. Pass the directory containing the " +"`.vis.db` files to `--logdir`:" +msgstr "" +"启动 `tensorboard` 并加载输出目录,可查看结构差异、精度对比、溢出检测、跨卡通信节点以及多级目录搜索/筛选。将包含 `.vis.db` " +"的目录传给 `--logdir`:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Inspect the visualization. The UI usually displays the overall model " +"structure with operators, parameters, and tensor I/O. Click any node to " +"expand its children." +msgstr "在可视化界面中可查看模型整体结构(算子、参数、张量 I/O),点击节点可展开其子结构。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"**Difference visualization**: Comparison results highlight divergent nodes " +"with different colors (the larger the difference, the redder the node). " +"Click a node to view its detailed information including tensor " +"inputs/outputs, parameters, and operator type. Analyze the data difference " +"and the surrounding connections to pinpoint the exact divergence." +msgstr "" +"**差异可视化**:对比结果会使用不同颜色突出显示差异节点(差异越大颜色越红)。点击节点可查看输入输出张量、参数以及算子类型,据此结合上下游关系定位具体差异点。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "**Helper features**:" +msgstr "**辅助功能**:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Switch rank/step: Quickly check difference nodes on different ranks and " +"steps." +msgstr "切换 rank/step:快速查看不同 rank 和 step 下的差异节点。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Search/filter: Use the search box to filter nodes by operator name, etc." +msgstr "搜索/筛选:可根据算子名称等快速过滤节点。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Manual mapping: Automatic mapping cannot cover every case, so the tool lets " +"you manually map nodes between the problem and benchmark graphs before " +"generating comparison results." +msgstr "手动映射:当自动映射无法覆盖所有情况时,可手动匹配问题侧与标杆侧节点后再生成对比结果。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "5. Troubleshooting" +msgstr "5. 故障排查" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"`RuntimeError: Please enforce eager mode`: Restart vLLM and add the " +"`--enforce-eager` flag." +msgstr "" +"`RuntimeError: Please enforce eager mode`:重启 vLLM 并加上 `--enforce-eager` 参数。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"No dump files: Confirm that the JSON path is correct and every node has " +"write permission. In distributed scenarios set `keep_all_ranks` so that " +"every rank writes its own dump." +msgstr "" +"缺少 dump 文件:检查 JSON 路径是否正确、各节点是否具有写权限;分布式场景可启用 `keep_all_ranks` 让每个 rank " +"单独写入。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"Dumps are too large: Start with a `statistics` task to locate abnormal " +"tensors, then narrow the scope with `scope`/`list`/`tensor_list`, `filters`," +" `token_range`, etc." +msgstr "" +"dump 体积过大:建议先运行 `statistics` 任务定位异常张量,再通过 " +"`scope`/`list`/`tensor_list`、`filters`、`token_range` 等方式缩小范围。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "Appendix" +msgstr "附录" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "dump.json file description" +msgstr "dump.json 文件说明" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "L0 level" +msgstr "L0 级别" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"An L0 `dump.json` contains forward/backward I/O for modules together with " +"parameters and parameter gradients. Using PyTorch's `Conv2d` as an example, " +"the network code looks like:" +msgstr "" +"L0 级别的 `dump.json` 包含模块的前/反向输入输出以及参数与参数梯度。以下以 PyTorch 的 `Conv2d` 为例,网络代码如下:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"`output = self.conv2(input) # self.conv2 = torch.nn.Conv2d(64, 128, 5, " +"padding=2, bias=True)`" +msgstr "" +"`output = self.conv2(input) # self.conv2 = torch.nn.Conv2d(64, 128, 5, " +"padding=2, bias=True)`" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "`dump.json` contains the following entries:" +msgstr "`dump.json` 包含以下条目:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"`Module.conv2.Conv2d.forward.0`: Forward data of the module. `input_args` " +"represents positional inputs, `input_kwargs` represents keyword inputs, " +"`output` stores forward outputs, and `parameters` stores weights/biases." +msgstr "" +"`Module.conv2.Conv2d.forward.0`:模块的前向数据,`input_args` 为位置参数,`input_kwargs` " +"为关键字参数,`output` 存放前向输出,`parameters` 存放权重和偏置。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"`Module.conv2.Conv2d.parameters_grad`: Parameter gradients (weight and " +"bias)." +msgstr "`Module.conv2.Conv2d.parameters_grad`:模块参数的梯度(weight 与 bias)。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"`Module.conv2.Conv2d.backward.0`: Backward data of the module. `input` " +"represents gradients that flow into the module (gradients of the forward " +"outputs) and `output` represents gradients that flow out (gradients of the " +"module inputs)." +msgstr "" +"`Module.conv2.Conv2d.backward.0`:模块的反向数据,`input` 表示流入模块的梯度(对应前向输出),`output` " +"表示流出的梯度(对应模块输入)。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +#, python-brace-format +msgid "" +"**Note**: When the `model` parameter passed to the dump API is " +"`List[torch.nn.Module]` or `Tuple[torch.nn.Module]`, module-level names " +"include the index inside the list (`{Module}.{index}.*`). Example: " +"`Module.0.conv1.Conv2d.forward.0`." +msgstr "" +"**说明**:当 dump API 的 `model` 参数为 `List[torch.nn.Module]` 或 " +"`Tuple[torch.nn.Module]` 时,模块级名称会包含其在列表中的索引(`{Module}.{index}.*`),例如 " +"`Module.0.conv1.Conv2d.forward.0`。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "L1 level" +msgstr "L1 级别" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"An L1 `dump.json` records forward/backward I/O for APIs. Using PyTorch's " +"`relu` function as an example (`output = torch.nn.functional.relu(input)`), " +"the file contains:" +msgstr "" +"L1 级别的 `dump.json` 记录 API 的前/反向输入输出。以下以 PyTorch 的 `relu` 函数(`output = " +"torch.nn.functional.relu(input)`)为例:" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"`Functional.relu.0.forward`: Forward data of the API. `input_args` are " +"positional inputs, `input_kwargs` are keyword inputs, and `output` stores " +"the forward outputs." +msgstr "" +"`Functional.relu.0.forward`:API 的前向数据,`input_args` 为位置输入,`input_kwargs` " +"为关键字输入,`output` 存放前向输出。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"`Functional.relu.0.backward`: Backward data of the API. `input` represents " +"the gradients of the forward outputs, and `output` represents the gradients " +"that flow back to the forward inputs." +msgstr "" +"`Functional.relu.0.backward`:API 的反向数据,`input` 表示前向输出的梯度,`output` " +"表示回传到前向输入的梯度。" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "mix level" +msgstr "mix 级别" + +#: ../../developer_guide/performance_and_debug/msprobe_guide.md +msgid "" +"A `mix` dump.json contains both L0 and L1 level data; the file format is the" +" same as the examples above." +msgstr "`mix` 级别的 dump.json 同时包含 L0 与 L1 数据,文件格式与上述示例相同。" diff --git a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/performance_benchmark.po b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance_and_debug/performance_benchmark.po similarity index 77% rename from docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/performance_benchmark.po rename to docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance_and_debug/performance_benchmark.po index 484edac3..3c119556 100644 --- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/performance_benchmark.po +++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance_and_debug/performance_benchmark.po @@ -20,11 +20,11 @@ msgstr "" "Plural-Forms: nplurals=1; plural=0;\n" "Generated-By: Babel 2.17.0\n" -#: ../../developer_guide/performance/performance_benchmark.md:1 +#: ../../developer_guide/performance_and_debug/performance_benchmark.md:1 msgid "Performance Benchmark" msgstr "性能基准" -#: ../../developer_guide/performance/performance_benchmark.md:2 +#: ../../developer_guide/performance_and_debug/performance_benchmark.md:2 msgid "" "This document details the benchmark methodology for vllm-ascend, aimed at " "evaluating the performance under a variety of workloads. To maintain " @@ -34,7 +34,7 @@ msgstr "" "本文档详细说明了 vllm-ascend 的基准测试方法,旨在评估其在多种工作负载下的性能。为了与 vLLM 保持一致,我们使用 vllm 项目提供的 " "[benchmark](https://github.com/vllm-project/vllm/tree/main/benchmarks) 脚本。" -#: ../../developer_guide/performance/performance_benchmark.md:4 +#: ../../developer_guide/performance_and_debug/performance_benchmark.md:4 msgid "" "**Benchmark Coverage**: We measure offline e2e latency and throughput, and " "fixed-QPS online serving benchmarks, for more details see [vllm-ascend " @@ -44,24 +44,24 @@ msgstr "" "**基准测试覆盖范围**:我们测量离线端到端延迟和吞吐量,以及固定 QPS 的在线服务基准测试。更多详情请参见 [vllm-ascend " "基准测试脚本](https://github.com/vllm-project/vllm-ascend/tree/main/benchmarks)。" -#: ../../developer_guide/performance/performance_benchmark.md:6 +#: ../../developer_guide/performance_and_debug/performance_benchmark.md:6 msgid "1. Run docker container" msgstr "1. 运行 docker 容器" -#: ../../developer_guide/performance/performance_benchmark.md:31 +#: ../../developer_guide/performance_and_debug/performance_benchmark.md:31 msgid "2. Install dependencies" msgstr "2. 安装依赖项" -#: ../../developer_guide/performance/performance_benchmark.md:38 +#: ../../developer_guide/performance_and_debug/performance_benchmark.md:38 msgid "3. (Optional)Prepare model weights" msgstr "3.(可选)准备模型权重" -#: ../../developer_guide/performance/performance_benchmark.md:39 +#: ../../developer_guide/performance_and_debug/performance_benchmark.md:39 msgid "" "For faster running speed, we recommend downloading the model in advance:" msgstr "为了更快的运行速度,建议提前下载模型:" -#: ../../developer_guide/performance/performance_benchmark.md:44 +#: ../../developer_guide/performance_and_debug/performance_benchmark.md:44 msgid "" "You can also replace all model paths in the [json](https://github.com/vllm-" "project/vllm-ascend/tree/main/benchmarks/tests) files with your local paths:" @@ -69,19 +69,19 @@ msgstr "" "你也可以将 [json](https://github.com/vllm-project/vllm-" "ascend/tree/main/benchmarks/tests) 文件中的所有模型路径替换为你的本地路径:" -#: ../../developer_guide/performance/performance_benchmark.md:60 +#: ../../developer_guide/performance_and_debug/performance_benchmark.md:60 msgid "4. Run benchmark script" msgstr "4. 运行基准测试脚本" -#: ../../developer_guide/performance/performance_benchmark.md:61 +#: ../../developer_guide/performance_and_debug/performance_benchmark.md:61 msgid "Run benchmark script:" msgstr "运行基准测试脚本:" -#: ../../developer_guide/performance/performance_benchmark.md:66 +#: ../../developer_guide/performance_and_debug/performance_benchmark.md:66 msgid "After about 10 mins, the output is as shown below:" msgstr "大约 10 分钟后,输出如下所示:" -#: ../../developer_guide/performance/performance_benchmark.md:176 +#: ../../developer_guide/performance_and_debug/performance_benchmark.md:176 msgid "" "The result json files are generated into the path `benchmark/results` These " "files contain detailed benchmarking results for further analysis." diff --git a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/profile_execute_duration.po b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance_and_debug/profile_execute_duration.po similarity index 80% rename from docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/profile_execute_duration.po rename to docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance_and_debug/profile_execute_duration.po index 7c83ca9b..db4b1bc8 100644 --- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/profile_execute_duration.po +++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance_and_debug/profile_execute_duration.po @@ -20,11 +20,11 @@ msgstr "" "Plural-Forms: nplurals=1; plural=0;\n" "Generated-By: Babel 2.17.0\n" -#: ../../developer_guide/performance/profile_execute_duration.md:1 +#: ../../developer_guide/performance_and_debug/profile_execute_duration.md:1 msgid "Profile Execute Duration" msgstr "配置执行持续时间" -#: ../../developer_guide/performance/profile_execute_duration.md:3 +#: ../../developer_guide/performance_and_debug/profile_execute_duration.md:3 msgid "" "The execution duration of each stage (including pre/post-processing, model " "forward, etc.) usually needs to be captured during a complete inference " @@ -35,24 +35,24 @@ msgstr "" "在完整的推理过程中,通常需要记录每个阶段(包括前/后处理、模型前向等)的执行时长。一般通过使用 `torch.npu.synchronize()` " "并获取 CPU 时间戳来实现,这会增加主机/设备同步的性能开销。" -#: ../../developer_guide/performance/profile_execute_duration.md:5 +#: ../../developer_guide/performance_and_debug/profile_execute_duration.md:5 msgid "" "**To reduce the performance overhead, we add this feature, using the NPU " "event timestamp mechanism to observe the device execution time " "asynchronously.**" msgstr "**为了减少性能开销,我们添加了此功能,使用 NPU 事件时间戳机制异步观测设备的执行时间。**" -#: ../../developer_guide/performance/profile_execute_duration.md:7 +#: ../../developer_guide/performance_and_debug/profile_execute_duration.md:7 msgid "Usage" msgstr "用法" -#: ../../developer_guide/performance/profile_execute_duration.md:8 +#: ../../developer_guide/performance_and_debug/profile_execute_duration.md:8 msgid "" "Use the environment variable `VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE` to " "enable this feature." msgstr "使用环境变量 `VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE` 来启用此功能。" -#: ../../developer_guide/performance/profile_execute_duration.md:9 +#: ../../developer_guide/performance_and_debug/profile_execute_duration.md:9 msgid "" "Use the non-blocking API `ProfileExecuteDuration().capture_async` to set " "observation points asynchronously when you need to observe the execution " @@ -60,7 +60,7 @@ msgid "" msgstr "" "当你需要观察执行时长时,可以使用非阻塞 API `ProfileExecuteDuration().capture_async` 异步设置观察点。" -#: ../../developer_guide/performance/profile_execute_duration.md:10 +#: ../../developer_guide/performance_and_debug/profile_execute_duration.md:10 msgid "" "Use the blocking API `ProfileExecuteDuration().pop_captured_sync` at an " "appropriate time to get and print the execution durations of all observed " @@ -69,13 +69,13 @@ msgstr "" "在适当的时机使用阻塞式 API `ProfileExecuteDuration().pop_captured_sync` " "获取并打印所有已观察到阶段的执行时长。" -#: ../../developer_guide/performance/profile_execute_duration.md:12 +#: ../../developer_guide/performance_and_debug/profile_execute_duration.md:12 msgid "" "**We have instrumented the key inference stages (including pre-processing, " "model forward pass, etc.) for execute duration profiling. Execute the script" " as follows:**" msgstr "**我们已经对关键的推理阶段(包括预处理、模型前向传递等)进行了执行时长分析的检测。请按如下方式执行脚本:**" -#: ../../developer_guide/performance/profile_execute_duration.md:17 +#: ../../developer_guide/performance_and_debug/profile_execute_duration.md:17 msgid "Example Output" msgstr "示例输出" diff --git a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/service_profiling_guide.po b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance_and_debug/service_profiling_guide.po similarity index 68% rename from docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/service_profiling_guide.po rename to docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance_and_debug/service_profiling_guide.po index a34d187f..ca57e0a6 100644 --- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/service_profiling_guide.po +++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance_and_debug/service_profiling_guide.po @@ -20,15 +20,15 @@ msgstr "" "Plural-Forms: nplurals=1; plural=0;\n" "Generated-By: Babel 2.17.0\n" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Service Profiling Guide" msgstr "服务化性能采集指南" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "In inference service processes, we sometimes need to monitor the internal execution flow of the inference service framework to identify performance issues. By collecting start and end timestamps of key processes, identifying critical functions or iterations, recording key events, and capturing diverse types of information, we can quickly pinpoint performance bottlenecks." msgstr "在推理服务过程中,我们有时需要监控推理服务框架的内部执行流程以定位性能问题。通过采集关键流程的起止时间、识别关键函数或迭代、记录关键事件并捕获多种类型的信息,可以快速定位性能瓶颈。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "" "This guide walks you through collecting performance data for the vllm-ascend " "service framework and operators. It covers the full workflow from preparation " @@ -37,23 +37,23 @@ msgid "" msgstr "" "本部分将指导你如何采集 vllm-ascend 的服务化框架性能数据以及算子性能数据,覆盖从准备、采集、解析到结果展示的完整流程,帮助你快速上手性能采集工具。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Quick Start" msgstr "快速开始" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "0 Installation" msgstr "0 安装" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Install the `msserviceprofiler` package using pip:" msgstr "使用 pip 安装 `msserviceprofiler` 包:" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "1 Preparation" msgstr "1 准备采集" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "" "Before starting the service, set the environment variable " "`SERVICE_PROF_CONFIG_PATH` to point to the profiling configuration file, " @@ -63,19 +63,19 @@ msgid "" msgstr "" "在启动服务之前,请设置环境变量`SERVICE_PROF_CONFIG_PATH`指定需要加载的性能分析配置文件,并设置环境变量`PROFILING_SYMBOLS_PATH`来指定需要导入的符号的 YAML 配置文件。之后,根据您的部署方式启动 vLLM 服务。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "cd ${path_to_store_profiling_files}" msgstr "cd ${profiling 文件存放路径}" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Set environment variable" msgstr "设置环境变量" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Start vLLM service" msgstr "启动 vLLM 服务" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "" "The file `ms_service_profiler_config.json` is the profiling configuration. " "If it does not exist at the specified path, a default configuration will be " @@ -84,7 +84,7 @@ msgid "" msgstr "" "其中 `ms_service_profiler_config.json` 为采集配置文件。若指定路径下不存在该文件,将自动生成一份默认配置。若有需要,可参照 `采集配置文件说明` 章节提前进行自定义配置。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "" "`service_profiling_symbols.yaml` is the configuration file containing " "the profiling points to be imported. You can choose **not** to set the " @@ -95,480 +95,480 @@ msgid "" "to the instructions in the `Symbols Configuration File` section below." msgstr "`service_profiling_symbols.yaml` 为需要导入的埋点配置文件。你也可以选择不设置环境变量 `PROFILING_SYMBOLS_PATH`,此时将使用默认的配置文件;若你指定的路径下不存在该文件,系统同样会在你指定的路径生成一份配置文件以便后续修改。可参考 `点位配置文件说明` 一节进行自定义。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "2 Enable Profiling" msgstr "2 开启采集" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "" "To enable the performance data collection switch, change the `enable` field from " "`0` to `1` in the configuration file `ms_service_profiler_config.json`. This can " "be accomplished by executing the following sed command:" msgstr "将配置文件`ms_service_profiler_config.json`中的 `enable` 字段由 `0` 修改为 `1`,即可开启性能数据采集的开关,可以通过执行下面sed指令完成采集服务的开启:" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "3 Send Requests" msgstr "3 发送请求" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "" "Choose a request-sending method that suits your actual profiling needs:" msgstr "根据实际采集需求选择请求发送方式:" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "4 Analyze Data" msgstr "4 解析数据" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "xxxx-xxxx is the directory automatically created based on vLLM startup time" msgstr "xxxx-xxxx 为采集工具根据 vLLM 启动时间自动创建的存放目录" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Analyze data" msgstr "解析数据" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "5 View Results" msgstr "5 查看结果" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "After analysis, the `output` directory will contain:" msgstr "解析完成后,`output` 目录下会生成:" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "" "`chrome_tracing.json`: Chrome tracing format data, which can be opened in " "[MindStudio Insight](https://www.hiascend.com/document/detail/zh/mindstudio/81RC1/GUI_baseddevelopmenttool/msascendinsightug/Insight_userguide_0002.html)." msgstr "" "`chrome_tracing.json`:Chrome 追踪格式数据,可在 [MindStudio Insight](https://www.hiascend.com/document/detail/zh/mindstudio/81RC1/GUI_baseddevelopmenttool/msascendinsightug/Insight_userguide_0002.html) 中打开。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "`profiler.db`: Performance data in database format." msgstr "`profiler.db`:数据库格式的性能数据。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "`request.csv`: Request-related data." msgstr "`request.csv`:请求相关数据。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "`request_summary.csv`: Overall request metrics." msgstr "`request_summary.csv`:请求总体统计指标。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "`kvcache.csv`: KV Cache-related data." msgstr "`kvcache.csv`:KV Cache 相关数据。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "`batch.csv`: Batch scheduling-related data." msgstr "`batch.csv`:批次调度相关数据。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "`batch_summary.csv`: Overall batch scheduling metrics." msgstr "`batch_summary.csv`:批次调度总体统计指标。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "`service_summary.csv`: Overall service-level metrics." msgstr "`service_summary.csv`:服务化维度总体统计指标。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Appendix" msgstr "附录" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "1 Profiling Configuration File" msgstr "1 采集配置文件说明" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "The profiling configuration file controls profiling parameters and behavior." msgstr "采集配置文件用于控制性能数据采集的参数与行为。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "File Format" msgstr "配置文件格式" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "The configuration is in JSON format. Main parameters:" msgstr "配置文件为 JSON 格式,主要参数如下:" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Parameter" msgstr "参数" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Description" msgstr "说明" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Required" msgstr "是否必选" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "enable" msgstr "enable" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Switch for profiling:
0: disable
1: enable
Default: 0" msgstr "是否开启性能数据采集的开关:
0:关闭
1:开启
默认值:0" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Yes" msgstr "是" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "prof_dir" msgstr "prof_dir" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Directory to store collected performance data.
Default: $HOME/.ms_service_profiler" msgstr "采集到性能数据的存放路径,支持用户自定义。
默认值:$HOME/.ms_service_profiler" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "No" msgstr "否" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "profiler_level" msgstr "profiler_level" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Data collection level. Default is \"INFO\" (normal level)." msgstr "数据采集等级。默认值为\"INFO\",指普通级别的性能数据。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "host_system_usage_freq" msgstr "host_system_usage_freq" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Sampling frequency of host CPU and memory metrics. Disabled by default. Range: integer 1–50, unit: Hz (times per second). Set to -1 to disable.
Note: Enabling this may consume significant memory." msgstr "CPU和内存系统指标采集频率,默认关闭不采集。范围整数1~50,单位hz,表示每秒采集的次数。设置为-1时关闭采集该指标。
说明:开启该功能可能占用较大内存" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "npu_memory_usage_freq" msgstr "npu_memory_usage_freq" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Sampling frequency of NPU memory utilization. Disabled by default. Range: integer 1–50, unit: Hz (times per second). Set to -1 to disable.
Note: Enabling this may consume significant memory." msgstr "NPU Memory使用率指标的采集频率,默认关闭不采集。范围整数1~50,单位hz,表示每秒采集的次数。设置为-1时关闭采集该指标。
说明:开启该功能可能占用较大内存" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "acl_task_time" msgstr "acl_task_time" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Switch to collect operator dispatch latency and execution latency:
0: disable (default; 0 or invalid values mean disabled).
1: enable; calls `aclprofCreateConfig` with `ACL_PROF_TASK_TIME_L0`.
2: enable MSPTI-based data dumping; uses MSPTI for profiling and requires: `export LD_PRELOAD=$ASCEND_TOOLKIT_HOME/lib64/libmspti.so`" msgstr "开启采集算子下发耗时、算子执行耗时数据的开关,取值为:
0:关闭。默认值,配置为0或其他非法值均表示关闭。
1:开启。该功能开启时调用aclprofCreateConfig接口的ACL_PROF_TASK_TIME_L0参数。
2:开启基于MSPTI接口的数据落盘。该功能开启时调用MSPTI接口进行性能数据采集,需要配置如下环境变量:export LD_PRELOAD=$ASCEND_TOOLKIT_HOME/lib64/libmspti.so" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "acl_prof_task_time_level" msgstr "acl_prof_task_time_level" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Level and duration for profiling:
L0: collect operator dispatch and execution latency only; lower overhead (no operator basic info).
L1: collect AscendCL interface performance (host–device and inter-device sync/async memory copy latencies), plus operator dispatch, execution, and basic info for comprehensive analysis.
time: profiling duration, integer 1–999, in seconds.
If unset, defaults to L0 until program exit; invalid values fall back to defaults.
Level and duration can be combined, e.g., `\"acl_prof_task_time_level\": \"L1,10\"`." msgstr "设置性能数据采集的Level等级和时长,取值为:
L0:Level0等级,表示采集算子下发耗时、算子执行耗时数据。与L1相比,由于不采集算子基本信息数据,采集时性能开销较小,可更精准统计相关耗时数据。
L1:Level1等级,采集AscendCL接口的性能数据,包括Host与Device之间、Device间的同步异步内存复制时延;采集算子下发耗时、算子执行耗时数据以及算子基本信息数据,提供更全面的性能分析数据。
time:采集时长,取值范围为1~999的正整数,单位s。
默认未配置本参数,表示采集L0数据,且采集到程序执行结束。配置其他非法值时取默认值。
采集的Level等级和时长可同时配置,例如\"acl_prof_task_time_level\": \"L1,10\"。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "api_filter" msgstr "api_filter" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Filter to select API performance data to dump. For example, specifying \"matmul\" dumps all API data whose `name` contains \"matmul\". String, case-sensitive; use \";\" to separate multiple targets. Empty means dump all.
Effective only when `acl_task_time` is 2." msgstr "对性能数据进行过滤,配置该参数可自定义采集配置的API性能数据,例如传入\"matmul\"会落盘所有API数据中name字段包含matmul的性能数据。str类型,区分大小写,多个不同的筛选目标用\";\"隔开,默认为空,表示落盘所有数据。
仅当acl_task_time参数值为2时生效。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "kernel_filter" msgstr "kernel_filter" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Filter to select kernel performance data to dump. For example, specifying \"matmul\" dumps all kernel data whose `name` contains \"matmul\". String, case-sensitive; use \";\" to separate multiple targets. Empty means dump all.
Effective only when `acl_task_time` is 2." msgstr "对性能数据进行过滤,配置该参数可自定义采集配置的kernel性能数据,例如传入\"matmul\"会落盘所有kernel数据中name字段包含matmul的性能数据。str类型,区分大小写,多个不同的筛选目标用\";\"隔开,默认为空,表示落盘所有数据。
仅当acl_task_time参数值为2时生效。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "timelimit" msgstr "timelimit" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Profiling duration for the service. The process stops automatically after this time. Range: integer 0–7200, unit: seconds. Default 0 means unlimited." msgstr "设置服务化性能数据采集的时长,配置该参数后,采集进程将在运行指定的时间后自动停止,取值范围为0~7200的整数,单位s,默认值0(表示不限制采集时间)" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "domain" msgstr "domain" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Limit profiling to the specified domains to reduce data volume. String, separated by semicolons, case-sensitive, e.g., \"Request; KVCache\".
Empty means all available domains.
Available domains: Request, KVCache, ModelExecute, BatchSchedule, Communication.
Note: If the selected domains are incomplete, analysis output may show warnings due to missing data. See [Reference Table 1](https://www.hiascend.com/document/detail/zh/canncommercial/82RC1/devaids/Profiling/mindieprofiling_0009.html#ZH-CN_TOPIC_0000002370256365__table1985410131831)." msgstr "设置采集指定domain域下的性能数据,减少采集数据量。输入参数为字符串格式,英文分号作为分隔符,区分大小写,例如:\"Request; KVCache\"。
默认为空,表示采集当前所有domain域内性能数据。
当前已有domain域为:Request、KVCache、ModelExecute、BatchSchedule、Communication。
说明:
若指定domain域不全,采集数据不满足解析输出件生成时,会有告警提示。[查看表1](https://www.hiascend.com/document/detail/zh/canncommercial/82RC1/devaids/Profiling/mindieprofiling_0009.html#ZH-CN_TOPIC_0000002370256365__table1985410131831)" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Example Configuration" msgstr "配置示例" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "2 Symbols Configuration File" msgstr "2 点位配置文件说明" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "The symbols configuration file defines which functions/methods to profile and supports flexible configuration with custom attribute collection." msgstr "点位配置文件用于定义需要采集的函数/方法,支持灵活配置与自定义属性采集。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "2.1 File Name and Loading" msgstr "2.1 文件命名与加载" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Default load path:`~/.config/vllm_ascend/service_profiling_symbols.MAJOR.MINOR.PATCH.yaml`( According to the installed version of vllm )" msgstr "默认加载路径:`~/.config/vllm_ascend/service_profiling_symbols.MAJOR.MINOR.PATCH.yaml`(随已安装的 vllm 版本变化)" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "If you need to customize the profiling points, it is highly recommended to copy a profiling configuration file to your working directory using the `PROFILING_SYMBOLS_PATH` environment variable." msgstr "如需自定义采集点,推荐通过设置环境变量`PROFILING_SYMBOLS_PATH`,将一份点位配置文件复制到工作目录进行修改使用。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "2.2 Field Descriptions" msgstr "2.2 配置字段说明" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Field" msgstr "字段" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Example" msgstr "示例" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "symbol" msgstr "symbol" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Python import path + attribute chain" msgstr "Python 导入路径 + 属性链" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "`\"vllm.v1.core.kv_cache_manager:KVCacheManager.free\"`" msgstr "`\"vllm.v1.core.kv_cache_manager:KVCacheManager.free\"`" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "handler" msgstr "handler" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Handler type" msgstr "处理函数类型" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "`\"timer\"` (default) or `\"pkg.mod:func\"` (custom)" msgstr "`\"timer\"`(默认)或 `\"pkg.mod:func\"`(自定义)" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "domain" msgstr "domain" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Domain tag" msgstr "埋点域标识" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "`\"KVCache\"`, `\"ModelExecute\"`" msgstr "`\"KVCache\"`, `\"ModelExecute\"`" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "name" msgstr "name" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Event name" msgstr "埋点名称" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "`\"EngineCoreExecute\"`" msgstr "`\"EngineCoreExecute\"`" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "min_version" msgstr "min_version" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "max_version" msgstr "max_version" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Upper version constraint" msgstr "最高版本约束" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Lower version constraint" msgstr "最低版本约束" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "`\"0.9.1\"`" msgstr "`\"0.9.1\"`" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "`\"0.11.0\"`" msgstr "`\"0.11.0\"`" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "attributes" msgstr "attributes" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Custom attribute collection" msgstr "自定义属性采集" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Only support for `"timer"` handler. See the section below" msgstr "只支持 `"timer"` handler。详见下方自定义属性采集机制" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "2.3 Examples" msgstr "2.3 配置示例" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Example 1: Custom handler" msgstr "示例 1:自定义处理函数" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Example 2: Default timer" msgstr "示例 2:默认计时器" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Example 3: Version constraint" msgstr "示例 3:版本约束" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "No handler specified -> default timer" msgstr "未指定 handler -> 默认 timer" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "2.4 Custom Attribute Collection" msgstr "2.4 自定义属性采集机制" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "The `attributes` field supports flexible custom attribute collection and allows operations and transformations on function arguments and return values." msgstr "`attributes` 字段支持灵活的自定义属性采集,可对函数参数与返回值进行多种操作与转换。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Basic Syntax" msgstr "基本语法" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Argument access: use the parameter name directly, e.g., `input_ids`" msgstr "参数访问:直接使用参数名,如 `input_ids`" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Return value access: use the `return` keyword" msgstr "返回值访问:使用 `return` 关键字" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Pipeline operations: use `|` to chain multiple operations" msgstr "管道操作:使用 `|` 分隔多个操作" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Attribute access: use `attr` to access object attributes" msgstr "属性访问:使用 `attr` 获取对象属性" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Example" msgstr "配置示例" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Expression Notes" msgstr "表达式说明" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "`len(input_ids)`: get the length of parameter `input_ids`." msgstr "`len(input_ids)`:获取 `input_ids` 参数的长度。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "" "`len(return) | str`: get the length of the return value and convert to " "string (equivalent to `str(len(return))`)." msgstr "`len(return) | str`:获取返回值长度并转换为字符串(等价于 `str(len(return))`)。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "" "`return[0] | attr input_ids | len`: get the length of the `input_ids` " "attribute of the first element in the return value." msgstr "`return[0] | attr input_ids | len`:获取返回值第一个元素的 `input_ids` 属性长度。" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Supported Expression Types" msgstr "支持的表达式类型" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Basic operations: `len()`, `str()`, `int()`, `float()`" msgstr "基础操作:`len()`, `str()`, `int()`, `float()`" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Index access: `return[0]`, `return['key']`" msgstr "索引访问:`return[0]`, `return['key']`" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Attribute access: `return | attr attr_name`" msgstr "属性访问:`return | attr attr_name`" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Pipeline composition: chain operations with `|`" msgstr "管道组合:多个操作通过 `|` 连接" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Advanced Examples" msgstr "高级示例" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Get tensor shape" msgstr "获取张量形状" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Get specific value from a dict" msgstr "获取字典中的特定值" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Conditional expression (requires custom handler support)" msgstr "条件表达式(需要自定义处理函数支持)" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Complex data processing" msgstr "复杂的数据处理" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "2.5 Custom Handler" msgstr "2.5 自定义处理函数" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "" "When `handler` specifies a custom function, it must match the following " "signature:" msgstr "当 `handler` 字段指定自定义处理函数时,该函数需满足以下签名:" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Custom handler" msgstr "自定义处理函数" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "original_func: the original function object" msgstr "original_func: 原始函数对象" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "this: the bound object (for methods)" msgstr "this: 调用对象(对于方法调用)" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "*args: positional arguments" msgstr "*args: 位置参数" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "**kwargs: keyword arguments" msgstr "**kwargs: 关键字参数" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "processing result" msgstr "处理结果" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "Custom logic" msgstr "自定义处理逻辑" -#: ../../developer_guide/performance/service_profiling_guide.md +#: ../../developer_guide/performance_and_debug/service_profiling_guide.md msgid "" "If the custom handler fails to import, the system will automatically fall " "back to the default timer mode." diff --git a/docs/source/user_guide/configuration/additional_config.md b/docs/source/user_guide/configuration/additional_config.md index ec1e1a42..448f2ec4 100644 --- a/docs/source/user_guide/configuration/additional_config.md +++ b/docs/source/user_guide/configuration/additional_config.md @@ -42,6 +42,7 @@ The following table lists additional configuration options available in vLLM Asc | `num_wait_worker_iterations` | int | `30` | The forward iterations when the EPLB worker will finish CPU tasks. In our test default value 30 can cover most cases. | | `expert_map_record_path` | str | `None` | When dynamic EPLB is completed, save the current expert load heatmap to the specified path. | | `init_redundancy_expert` | int | `0` | Specify redundant experts during initialization. | +| `dump_config` | str | `None` | Configuration file path for msprobe dump(eager mode). | The details of each configuration option are as follows: diff --git a/mypy.ini b/mypy.ini index 6fe8e6c2..38881158 100644 --- a/mypy.ini +++ b/mypy.ini @@ -13,4 +13,8 @@ ignore_missing_imports = True ignore_missing_imports = True [mypy-lm_eval.*] -ignore_missing_imports = True \ No newline at end of file +ignore_missing_imports = True + +[mypy-msprobe.*] +ignore_missing_imports = True +allow_untyped_imports = True \ No newline at end of file diff --git a/requirements-dev.txt b/requirements-dev.txt index dca92174..d3db952d 100644 --- a/requirements-dev.txt +++ b/requirements-dev.txt @@ -19,3 +19,4 @@ librosa soundfile pytest_mock msserviceprofiler>=1.2.2 +mindstudio-probe>=8.3.0 \ No newline at end of file diff --git a/vllm_ascend/ascend_config.py b/vllm_ascend/ascend_config.py index 1fd1c67c..16d16a4d 100644 --- a/vllm_ascend/ascend_config.py +++ b/vllm_ascend/ascend_config.py @@ -44,6 +44,10 @@ class AscendConfig: self.ascend_scheduler_config = AscendSchedulerConfig( ascend_scheduler_config) + # Dump / PrecisionDebugger configuration + dump_config_path = additional_config.get("dump_config", None) + self.dump_config = DumpConfig(dump_config_path) + weight_prefetch_config = additional_config.get( "weight_prefetch_config", {}) self.weight_prefetch_config = WeightPrefetchConfig( @@ -230,6 +234,18 @@ class AscendSchedulerConfig: setattr(self, k, v) +class DumpConfig: + """ + Configuration object for dump/PrecisionDebugger settings. + """ + + def __init__(self, dump_config_path: Optional[str] = None): + # enable_dump is True when dump_cfg exists and config_path is not empty + self.enable_dump: bool = bool(dump_config_path) + # Path to msprobe config json; may be None. + self.config_path: Optional[str] = dump_config_path + + class WeightPrefetchConfig: """ Configuration Object for weight_prefetch_config from additional_config diff --git a/vllm_ascend/worker/model_runner_v1.py b/vllm_ascend/worker/model_runner_v1.py index 8f103ddc..5677550b 100644 --- a/vllm_ascend/worker/model_runner_v1.py +++ b/vllm_ascend/worker/model_runner_v1.py @@ -311,6 +311,7 @@ class NPUModelRunner(LoRAModelRunnerMixin): self.intermediate_tensors: Optional[IntermediateTensors] = None self.runner_only_attn_layers: set[str] = set() + # Ascend-specific configurations self.ascend_config = get_ascend_config() if self.ascend_config.ascend_scheduler_config.enabled: self.chunked_prefill_enabled = self.scheduler_config.chunked_prefill_enabled @@ -318,6 +319,17 @@ class NPUModelRunner(LoRAModelRunnerMixin): self.chunked_prefill_enabled = True self.weight_prefetch_method = WeightPrefetchMethod( self.ascend_config.weight_prefetch_config) + # Dump / PrecisionDebugger configuration now comes from AscendConfig + dump_cfg = self.ascend_config.dump_config + self.dump_enable = dump_cfg.enable_dump + self.debugger = None + if self.dump_enable: + if self.model_config.enforce_eager: + from msprobe.pytorch import PrecisionDebugger + self.debugger = PrecisionDebugger(dump_cfg.config_path) + else: + raise RuntimeError( + "Dumping/debugging only works in eager mode.") if self.cache_config.cache_dtype == "auto": self.kv_cache_dtype = self.dtype @@ -2284,6 +2296,18 @@ class NPUModelRunner(LoRAModelRunnerMixin): self.eplb_updator.take_update_info_from_eplb_process() moe_comm_type = self._select_moe_comm_method(num_input_tokens) + # prevent debugger is None + need_dump = self.dump_enable and self.debugger is not None + if need_dump: + assert self.debugger is not None + dbg_cfg = getattr(self.debugger, "config", None) + dump_level = str( + getattr(dbg_cfg, "level", + "L1")).upper() if dbg_cfg is not None else "L1" + if dump_level in ("L0", "MIX"): + self.debugger.start(model=self.model) + else: + self.debugger.start() uniform_decode = (max_query_len == self.uniform_decode_query_len) and ( scheduler_output.total_num_scheduled_tokens @@ -2341,6 +2365,10 @@ class NPUModelRunner(LoRAModelRunnerMixin): # For mid-pipeline stages, return the hidden states. if not broadcast_pp_output: hidden_states.kv_connector_output = kv_connector_output + if need_dump: + assert self.debugger is not None + self.debugger.stop() + self.debugger.step() return hidden_states assert isinstance(hidden_states, IntermediateTensors) get_pp_group().send_tensor_dict( @@ -2348,11 +2376,16 @@ class NPUModelRunner(LoRAModelRunnerMixin): logits = None else: if self.input_batch.pooling_params: - return self._pool( + pool_output = self._pool( hidden_states, scheduler_output.total_num_scheduled_tokens, num_scheduled_tokens_np, finished_sending, finished_recving, kv_connector_output) + if need_dump: + assert self.debugger is not None + self.debugger.stop() + self.debugger.step() + return pool_output sample_hidden_states = hidden_states[logits_indices] logits = self.model.compute_logits(sample_hidden_states) if broadcast_pp_output: @@ -2558,8 +2591,16 @@ class NPUModelRunner(LoRAModelRunnerMixin): if self.dynamic_eplb: self.eplb_updator.forward_end() if not self.use_async_scheduling: + if need_dump: + assert self.debugger is not None + self.debugger.stop() + self.debugger.step() return model_runner_output + if need_dump: + assert self.debugger is not None + self.debugger.stop() + self.debugger.step() return AsyncNPUModelRunnerOutput( model_runner_output=model_runner_output, sampled_token_ids=sampled_token_ids,