Fix torch profiler bugs for bench_offline_throughput.py (#6557)

2025-06-09 20:33:41 +08:00
parent 451ffe74d9
commit 98c00a2df1
5 changed files with 49 additions and 5 deletions
--- a/docs/references/benchmark_and_profiling.md
+++ b/docs/references/benchmark_and_profiling.md
@@ -52,6 +52,17 @@
  python -m sglang.bench_offline_throughput --model-path meta-llama/Llama-3.1-8B-Instruct --dataset-name random --num-prompts 10 --profile --mem-frac=0.8
  ```

+- Possible PyTorch Bug
+  If in any cases you encounter the following error (for example, using qwen 2.5 VL):
+  ```bash
+  RuntimeError: !stack.empty() INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/autograd/profiler_python.cpp":983, please report a bug to PyTorch. Python replay stack is empty.
+  ```
+  This is likely a PyTorch Bug reported in [Bug: vLLM Profiler](https://github.com/vllm-project/vllm/issues/18240) and [Bug: torch.profiler.profile](https://github.com/pytorch/pytorch/issues/101632). As a workaround, you may disable `with_stack` with an environment variable such as follows:
+  ```bash
+  export SGLANG_PROFILE_WITH_STACK=False
+  python -m sglang.bench_offline_throughput --model-path meta-llama/Llama-3.1-8B-Instruct --dataset-name random --num-prompts 10 --profile --mem-frac=0.8
+  ```
+
 - View Traces

  Trace files can be loaded and visualized from:
--- a/docs/references/environment_variables.md
+++ b/docs/references/environment_variables.md
@@ -88,6 +88,7 @@ SGLang supports various environment variables that can be used to configure its
 | Environment Variable | Description | Default Value |
 | --- | --- | --- |
 | `SGLANG_TORCH_PROFILER_DIR` | Directory for PyTorch profiler output | `/tmp` |
+| `SGLANG_PROFILE_WITH_STACK` | Set `with_stack` option (bool) for PyTorch profiler (capture stack trace) | `true` |

 ## Storage & Caching