Add example to use sgl engine with fastapi (#5648)

Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>
2025-04-24 21:27:05 +05:30
parent a14654dd68
commit d2b8d0b8d8
2 changed files with 194 additions and 0 deletions
--- a/examples/runtime/engine/readme.md
+++ b/examples/runtime/engine/readme.md
@@ -6,6 +6,7 @@ SGLang provides a direct inference engine without the need for an HTTP server. T
 1. **Offline Batch Inference**
 2. **Embedding Generation**
 3. **Custom Server on Top of the Engine**
+4. **Inference Using FastAPI**

 ## Examples

@@ -47,3 +48,7 @@ This will send both non-streaming and streaming requests to the server.
 ### [Token-In-Token-Out for RLHF](../token_in_token_out)

 In this example, we launch an SGLang engine, feed tokens as input and generate tokens as output.
+
+### [Inference Using FastAPI](fastapi_engine_inference.py)
+
+This example demonstrates how to create a FastAPI server that uses the SGLang engine for text generation.