From c3f2fc5a7a152a3679e753dcd023f38ef2458676 Mon Sep 17 00:00:00 2001 From: Byron Hsu Date: Sun, 13 Oct 2024 00:33:58 -0700 Subject: [PATCH] [doc] Add engine section in backend.md (#1656) --- docs/en/backend.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/en/backend.md b/docs/en/backend.md index 020848ba7..d19eb062a 100644 --- a/docs/en/backend.md +++ b/docs/en/backend.md @@ -93,6 +93,15 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1 ``` +### SRT Engine: Direct Inference Without HTTP + +SGLang provides a direct inference engine **without an HTTP server**. This can be used for: + +1. **Offline Batch Inference** +2. **Building Custom Servers** + +We provide usage examples [here](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) + ### Supported Models **Generative Models**