Improve docs (#662)

This commit is contained in:
Ying Sheng
2024-07-19 10:58:03 -07:00
committed by GitHub
parent 630479c3a6
commit e87c7fd501
6 changed files with 75 additions and 41 deletions

View File

@@ -8,23 +8,24 @@ The `/generate` endpoint accepts the following arguments in the JSON format.
class GenerateReqInput:
# The input prompt. It can be a single prompt or a batch of prompts.
text: Union[List[str], str]
# The token ids for text; one can either specify text or input_ids
# The token ids for text; one can either specify text or input_ids.
input_ids: Optional[Union[List[List[int]], List[int]]] = None
# The image input. It can be a file name.
# The image input. It can be a file name, a url, or base64 encoded string.
# See also python/sglang/srt/utils.py:load_image.
image_data: Optional[Union[List[str], str]] = None
# The sampling_params
# The sampling_params.
sampling_params: Union[List[Dict], Dict] = None
# The request id
# The request id.
rid: Optional[Union[List[str], str]] = None
# Whether to return logprobs
# Whether to return logprobs.
return_logprob: Optional[Union[List[bool], bool]] = None
# The start location of the prompt for return_logprob
# The start location of the prompt for return_logprob.
logprob_start_len: Optional[Union[List[int], int]] = None
# The number of top logprobs to return
# The number of top logprobs to return.
top_logprobs_num: Optional[Union[List[int], int]] = None
# Whether to detokenize tokens in logprobs
# Whether to detokenize tokens in logprobs.
return_text_in_logprobs: bool = False
# Whether to stream output
# Whether to stream output.
stream: bool = False
```
@@ -48,13 +49,19 @@ class SamplingParams:
) -> None:
```
- `max_new_tokens`, `stop`, `temperature`, `top_p`, `top_k` are common sampling parameters.
- `ignore_eos` means ignoring the EOS token and continue decoding, which is helpful for benchmarking purposes.
- `regex` constrains the output to follow a given regular expression.
## Examples
### Normal
Launch a server
```
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --port 30000
```
Send a request
```python
import requests
@@ -72,7 +79,7 @@ print(response.json())
```
### Streaming
Send a request and stream the output
```python
import requests, json
@@ -104,4 +111,32 @@ print("")
### Multi modal
See [test_httpserver_llava.py](../test/srt/test_httpserver_llava.py).
Launch a server
```
python3 -m sglang.launch_server --model-path liuhaotian/llava-v1.6-vicuna-7b --tokenizer-path llava-hf/llava-1.5-7b-hf --chat-template vicuna_v1.1 --port 30000
```
Download an image
```
curl -o example_image.png -L https://github.com/sgl-project/sglang/blob/main/test/lang/example_image.png?raw=true
```
```python
import requests
response = requests.post(
"http://localhost:30000/generate",
json={
"text": "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: <image>\nDescribe this picture ASSISTANT:",
"image_data": "example_image.png",
"sampling_params": {
"temperature": 0,
"max_new_tokens": 32,
},
},
)
print(response.json())
```
The `image_data` can be a file name, a URL, or a base64 encoded string. See also `python/sglang/srt/utils.py:load_image`.
Streaming is supported in a similar manner as [above](#streaming).