Add example to use sgl engine with fastapi (#5648)

Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>
This commit is contained in:
Ravi Theja
2025-04-24 21:27:05 +05:30
committed by GitHub
parent a14654dd68
commit d2b8d0b8d8
2 changed files with 194 additions and 0 deletions

View File

@@ -6,6 +6,7 @@ SGLang provides a direct inference engine without the need for an HTTP server. T
1. **Offline Batch Inference**
2. **Embedding Generation**
3. **Custom Server on Top of the Engine**
4. **Inference Using FastAPI**
## Examples
@@ -47,3 +48,7 @@ This will send both non-streaming and streaming requests to the server.
### [Token-In-Token-Out for RLHF](../token_in_token_out)
In this example, we launch an SGLang engine, feed tokens as input and generate tokens as output.
### [Inference Using FastAPI](fastapi_engine_inference.py)
This example demonstrates how to create a FastAPI server that uses the SGLang engine for text generation.