[doc] improve engine doc and add to readme (#1670)
This commit is contained in:
34
README.md
34
README.md
@@ -241,6 +241,40 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
|
||||
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1
|
||||
```
|
||||
|
||||
### Engine Without HTTP Server
|
||||
|
||||
We also provide an inference engine **without a HTTP server**. For example,
|
||||
|
||||
```python
|
||||
import sglang as sgl
|
||||
|
||||
|
||||
def main():
|
||||
prompts = [
|
||||
"Hello, my name is",
|
||||
"The president of the United States is",
|
||||
"The capital of France is",
|
||||
"The future of AI is",
|
||||
]
|
||||
sampling_params = {"temperature": 0.8, "top_p": 0.95}
|
||||
llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")
|
||||
|
||||
outputs = llm.generate(prompts, sampling_params)
|
||||
for prompt, output in zip(prompts, outputs):
|
||||
print("===============================")
|
||||
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
This can be used for:
|
||||
|
||||
1. **Offline Batch Inference**
|
||||
2. **Building Custom Servers**
|
||||
|
||||
You can view the full example [here](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine)
|
||||
|
||||
### Supported Models
|
||||
|
||||
**Generative Models**
|
||||
|
||||
@@ -93,14 +93,39 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
|
||||
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1
|
||||
```
|
||||
|
||||
### SRT Engine: Direct Inference Without HTTP
|
||||
### Engine Without HTTP Server
|
||||
|
||||
SGLang provides a direct inference engine **without an HTTP server**. This can be used for:
|
||||
We also provide an inference engine **without a HTTP server**. For example,
|
||||
|
||||
```python
|
||||
import sglang as sgl
|
||||
|
||||
|
||||
def main():
|
||||
prompts = [
|
||||
"Hello, my name is",
|
||||
"The president of the United States is",
|
||||
"The capital of France is",
|
||||
"The future of AI is",
|
||||
]
|
||||
sampling_params = {"temperature": 0.8, "top_p": 0.95}
|
||||
llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")
|
||||
|
||||
outputs = llm.generate(prompts, sampling_params)
|
||||
for prompt, output in zip(prompts, outputs):
|
||||
print("===============================")
|
||||
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
This can be used for:
|
||||
|
||||
1. **Offline Batch Inference**
|
||||
2. **Building Custom Servers**
|
||||
|
||||
We provide usage examples [here](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine)
|
||||
You can view the full example [here](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine)
|
||||
|
||||
### Supported Models
|
||||
|
||||
|
||||
Reference in New Issue
Block a user