Update readme (#434)
This commit is contained in:
@@ -326,15 +326,17 @@ response = client.chat.completions.create(
|
|||||||
print(response)
|
print(response)
|
||||||
```
|
```
|
||||||
|
|
||||||
In above example, the server uses the chat template specified in the model tokenizer.
|
|
||||||
You can override the chat template if needed when launching the server:
|
By default, the server uses the chat template specified in the model tokenizer from Hugging Face. It should just work for most official models such as Llama-2/Llama-3.
|
||||||
|
|
||||||
|
If needed, you can also override the chat template when launching the server:
|
||||||
|
|
||||||
```
|
```
|
||||||
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template llama-2
|
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template llama-2
|
||||||
```
|
```
|
||||||
|
|
||||||
If the chat template you are looking for is missing, you are welcome to contribute it.
|
If the chat template you are looking for is missing, you are welcome to contribute it.
|
||||||
Meanwhile, you can also temporary register your chat template as follows:
|
Meanwhile, you can also temporarily register your chat template as follows:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -30,7 +30,8 @@ if __name__ == "__main__":
|
|||||||
response = requests.post(
|
response = requests.post(
|
||||||
url + "/generate",
|
url + "/generate",
|
||||||
json={
|
json={
|
||||||
"input_ids": [[1,2,3], [1,2,3]],
|
"text": f"{a}, ",
|
||||||
|
#"input_ids": [[2] * 256] * 196,
|
||||||
"sampling_params": {
|
"sampling_params": {
|
||||||
"temperature": 0,
|
"temperature": 0,
|
||||||
"max_new_tokens": max_new_tokens,
|
"max_new_tokens": max_new_tokens,
|
||||||
|
|||||||
Reference in New Issue
Block a user