[doc] improve engine doc and add to readme (#1670)

2024-10-14 19:56:21 -07:00
parent 56503d9bc9
commit cd0be7489f
2 changed files with 62 additions and 3 deletions
--- a/README.md
+++ b/README.md
@@ -241,6 +241,40 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
 python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1
 ```
 
+### Engine Without HTTP Server
+
+We also provide an inference engine **without a HTTP server**. For example,
+
+```python
+import sglang as sgl
+
+
+def main():
+    prompts = [
+        "Hello, my name is",
+        "The president of the United States is",
+        "The capital of France is",
+        "The future of AI is",
+    ]
+    sampling_params = {"temperature": 0.8, "top_p": 0.95}
+    llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")
+
+    outputs = llm.generate(prompts, sampling_params)
+    for prompt, output in zip(prompts, outputs):
+        print("===============================")
+        print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
+
+if __name__ == "__main__":
+    main()
+```
+
+This can be used for:
+
+1. **Offline Batch Inference**
+2. **Building Custom Servers**
+
+You can view the full example [here](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine)
+
 ### Supported Models

 **Generative Models**
--- a/docs/en/backend.md
+++ b/docs/en/backend.md
@@ -93,14 +93,39 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
 python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1
 ```

-### SRT Engine: Direct Inference Without HTTP
+### Engine Without HTTP Server

-SGLang provides a direct inference engine **without an HTTP server**. This can be used for:
+We also provide an inference engine **without a HTTP server**. For example,
+
+```python
+import sglang as sgl
+
+
+def main():
+    prompts = [
+        "Hello, my name is",
+        "The president of the United States is",
+        "The capital of France is",
+        "The future of AI is",
+    ]
+    sampling_params = {"temperature": 0.8, "top_p": 0.95}
+    llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")
+
+    outputs = llm.generate(prompts, sampling_params)
+    for prompt, output in zip(prompts, outputs):
+        print("===============================")
+        print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
+
+if __name__ == "__main__":
+    main()
+```
+
+This can be used for:

 1. **Offline Batch Inference**
 2. **Building Custom Servers**

-We provide usage examples [here](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine)
+You can view the full example [here](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine)

 ### Supported Models