Update readme (#434)

This commit is contained in:
Lianmin Zheng
2024-05-13 00:17:02 -07:00
committed by GitHub
parent 39191c8515
commit 455c9ccc4a
2 changed files with 7 additions and 4 deletions

View File

@@ -326,15 +326,17 @@ response = client.chat.completions.create(
print(response) print(response)
``` ```
In above example, the server uses the chat template specified in the model tokenizer.
You can override the chat template if needed when launching the server: By default, the server uses the chat template specified in the model tokenizer from Hugging Face. It should just work for most official models such as Llama-2/Llama-3.
If needed, you can also override the chat template when launching the server:
``` ```
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template llama-2 python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template llama-2
``` ```
If the chat template you are looking for is missing, you are welcome to contribute it. If the chat template you are looking for is missing, you are welcome to contribute it.
Meanwhile, you can also temporary register your chat template as follows: Meanwhile, you can also temporarily register your chat template as follows:
```json ```json
{ {

View File

@@ -30,7 +30,8 @@ if __name__ == "__main__":
response = requests.post( response = requests.post(
url + "/generate", url + "/generate",
json={ json={
"input_ids": [[1,2,3], [1,2,3]], "text": f"{a}, ",
#"input_ids": [[2] * 256] * 196,
"sampling_params": { "sampling_params": {
"temperature": 0, "temperature": 0,
"max_new_tokens": max_new_tokens, "max_new_tokens": max_new_tokens,