Support v1/chat/completions (#50)
This commit is contained in:
42
README.md
42
README.md
@@ -248,6 +248,8 @@ In addition, the server supports an experimental OpenAI-compatible API.
|
||||
import openai
|
||||
client = openai.Client(
|
||||
base_url="http://127.0.0.1:30000/v1", api_key="EMPTY")
|
||||
|
||||
# Text completion
|
||||
response = client.completions.create(
|
||||
model="default",
|
||||
prompt="The capital of France is",
|
||||
@@ -255,6 +257,46 @@ response = client.completions.create(
|
||||
max_tokens=32,
|
||||
)
|
||||
print(response)
|
||||
|
||||
# Chat completion
|
||||
response = client.chat.completions.create(
|
||||
model="default",
|
||||
messages=[
|
||||
{"role": "system", "content": "You are a helpful AI assistant"},
|
||||
{"role": "user", "content": "List 3 countries and their capitals."},
|
||||
],
|
||||
temperature=0,
|
||||
max_tokens=64,
|
||||
)
|
||||
print(response)
|
||||
```
|
||||
|
||||
In above example, the server uses the chat template specified in the model tokenizer.
|
||||
You can override the chat template if needed when launching the server:
|
||||
|
||||
```
|
||||
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
|
||||
--chat-template llama-2
|
||||
```
|
||||
|
||||
If the chat template you are looking for is missing, you are welcome to contribute it.
|
||||
Meanwhile, you can also temporary register your chat template as follows:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "my_model",
|
||||
"system": "<|im_start|>system",
|
||||
"user": "<|im_start|>user",
|
||||
"assistant": "<|im_start|>assistant",
|
||||
"sep_style": "CHATML",
|
||||
"sep": "<|im_end|>",
|
||||
"stop_str": ["<|im_end|>", "<|im_start|>"]
|
||||
}
|
||||
```
|
||||
|
||||
```
|
||||
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
|
||||
--chat-template ./my_model_template.json
|
||||
```
|
||||
|
||||
### Additional Arguments
|
||||
|
||||
Reference in New Issue
Block a user