Support v1/chat/completions (#50)

This commit is contained in:
Cody Yu
2024-01-18 23:43:09 -08:00
committed by GitHub
parent 61d4c93962
commit 23471f9aa3
6 changed files with 705 additions and 9 deletions

View File

@@ -248,6 +248,8 @@ In addition, the server supports an experimental OpenAI-compatible API.
import openai
client = openai.Client(
base_url="http://127.0.0.1:30000/v1", api_key="EMPTY")
# Text completion
response = client.completions.create(
model="default",
prompt="The capital of France is",
@@ -255,6 +257,46 @@ response = client.completions.create(
max_tokens=32,
)
print(response)
# Chat completion
response = client.chat.completions.create(
model="default",
messages=[
{"role": "system", "content": "You are a helpful AI assistant"},
{"role": "user", "content": "List 3 countries and their capitals."},
],
temperature=0,
max_tokens=64,
)
print(response)
```
In above example, the server uses the chat template specified in the model tokenizer.
You can override the chat template if needed when launching the server:
```
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
--chat-template llama-2
```
If the chat template you are looking for is missing, you are welcome to contribute it.
Meanwhile, you can also temporary register your chat template as follows:
```json
{
"name": "my_model",
"system": "<|im_start|>system",
"user": "<|im_start|>user",
"assistant": "<|im_start|>assistant",
"sep_style": "CHATML",
"sep": "<|im_end|>",
"stop_str": ["<|im_end|>", "<|im_start|>"]
}
```
```
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
--chat-template ./my_model_template.json
```
### Additional Arguments