Document sampling parameters (#45)

This commit is contained in:
Lianmin Zheng
2024-01-18 11:49:27 -08:00
committed by GitHub
parent dafafe5b11
commit 05b4c398df
2 changed files with 94 additions and 4 deletions

View File

@@ -228,15 +228,19 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
Send a request
```
curl http://localhost:30000/v1/completions \
curl http://localhost:30000/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "Say this is a test",
"max_tokens": 16,
"temperature": 0
"text": "Once upon a time,",
"parameters": {
"max_new_tokens": 16,
"temperature": 0
}
}'
```
Learn more about the argument format [here](docs/sampling_params.md).
### Additional Arguments
- Add `--tp 2` to enable tensor parallelism.
```