Support v1/chat/completions (#50)

2024-01-18 23:43:09 -08:00
parent 61d4c93962
commit 23471f9aa3
6 changed files with 705 additions and 9 deletions
--- a/README.md
+++ b/README.md
@@ -248,6 +248,8 @@ In addition, the server supports an experimental OpenAI-compatible API.
 import openai
 client = openai.Client(
    base_url="http://127.0.0.1:30000/v1", api_key="EMPTY")
+
+# Text completion
 response = client.completions.create(
 	model="default",
 	prompt="The capital of France is",
@@ -255,6 +257,46 @@ response = client.completions.create(
 	max_tokens=32,
 )
 print(response)
+
+# Chat completion
+response = client.chat.completions.create(
+    model="default",
+    messages=[
+        {"role": "system", "content": "You are a helpful AI assistant"},
+        {"role": "user", "content": "List 3 countries and their capitals."},
+    ],
+    temperature=0,
+    max_tokens=64,
+)
+print(response)
+```
+
+In above example, the server uses the chat template specified in the model tokenizer.
+You can override the chat template if needed when launching the server:
+
+```
+python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
+--chat-template llama-2
+```
+
+If the chat template you are looking for is missing, you are welcome to contribute it.
+Meanwhile, you can also temporary register your chat template as follows:
+
+```json
+{
+  "name": "my_model",
+  "system": "<|im_start|>system",
+  "user": "<|im_start|>user",
+  "assistant": "<|im_start|>assistant",
+  "sep_style": "CHATML",
+  "sep": "<|im_end|>",
+  "stop_str": ["<|im_end|>", "<|im_start|>"]
+}
+```
+
+```
+python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
+--chat-template ./my_model_template.json
 ```

 ### Additional Arguments