Document sampling parameters (#45)

2024-01-18 11:49:27 -08:00
parent dafafe5b11
commit 05b4c398df
2 changed files with 94 additions and 4 deletions
--- a/README.md
+++ b/README.md
@@ -228,15 +228,19 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port

 Send a request
 ```
-curl http://localhost:30000/v1/completions \
+curl http://localhost:30000/generate \
  -H "Content-Type: application/json" \
  -d '{
-    "prompt": "Say this is a test",
-    "max_tokens": 16,
-    "temperature": 0
+    "text": "Once upon a time,",
+    "parameters": {
+      "max_new_tokens": 16,
+      "temperature": 0
+    }
  }'
 ```

+Learn more about the argument format [here](docs/sampling_params.md).
+
 ### Additional Arguments
 - Add `--tp 2` to enable tensor parallelism.
 ```