Add graph runner support with torch compile on CPU (#7843)

This commit is contained in:
Cao E
2025-09-08 12:33:58 +08:00
committed by GitHub
parent 8cda5a622c
commit 7577f0e40f
16 changed files with 820 additions and 48 deletions

View File

@@ -134,7 +134,12 @@ Notes:
export SGLANG_CPU_OMP_THREADS_BIND="0-39|43-82|86-125|128-167|171-210|214-253"
```
3. A warmup step is automatically triggered when the service is started.
3. For optimizing decoding with torch.compile, please add the flag `--enable-torch-compile`.
To specify the maximum batch size when using torch compile, set the flag `--torch-compile-max-bs`.
For example, `--enable-torch-compile --torch-compile-max-bs 4` means using torch compile and setting the
maximum batch size to 4.
4. A warmup step is automatically triggered when the service is started.
The server is ready when you see the log `The server is fired up and ready to roll!`.
## Benchmarking with Requests