Docs: add torch compile cache (#4151)

Co-authored-by: ybyang <ybyang7@iflytek.com>
2025-03-06 14:27:09 -08:00
parent 19fd57bcd7
commit ebddb65aed
6 changed files with 48 additions and 32 deletions
--- a/docs/references/torch_compile_cache.md
+++ b/docs/references/torch_compile_cache.md
@@ -1,13 +0,0 @@
-# Enabling cache for torch.compile
-
-SGLang uses `max-autotune-no-cudagraphs` mode of torch.compile. The auto-tuning can be slow.
-If you want to deploy a model on many different machines, you can ship the torch.compile cache to these machines and skip the compilation steps.
-
-This is based on https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html
-
-
-1. Generate the cache by setting TORCHINDUCTOR_CACHE_DIR and running the model once.
-```
-TORCHINDUCTOR_CACHE_DIR=/root/inductor_root_cache python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --enable-torch-compile
-```
-2. Copy the cache folder to other machines and launch the server with `TORCHINDUCTOR_CACHE_DIR`.