[Feat] Expose logprob options to sgl.gen API (#503)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
This commit is contained in:
12
README.md
12
README.md
@@ -279,8 +279,8 @@ for out in state.text_iter():
|
||||
```
|
||||
|
||||
### Tips and Implementation Details
|
||||
- The `choices` argument in `sgl.gen` is implemented by computing the normalized log probabilities of all choices and selecting the one with the highest probability.
|
||||
- The `regex` argument in `sgl.gen` is implemented through autoregressive decoding with logit bias masking, according to the constraints set by the regex.
|
||||
- The `choices` argument in `sgl.gen` is implemented by computing the [token-length normalized log probabilities](https://blog.eleuther.ai/multiple-choice-normalization/) of all choices and selecting the one with the highest probability.
|
||||
- The `regex` argument in `sgl.gen` is implemented through autoregressive decoding with logit bias masking, according to the constraints set by the regex. It is compatible with `temperature=0` and `temperature != 0`.
|
||||
|
||||
## Backend: SGLang Runtime (SRT)
|
||||
The SGLang Runtime (SRT) is designed to work best with the SGLang frontend.
|
||||
@@ -337,7 +337,6 @@ response = client.chat.completions.create(
|
||||
print(response)
|
||||
```
|
||||
|
||||
|
||||
By default, the server uses the chat template specified in the model tokenizer from Hugging Face. It should just work for most official models such as Llama-2/Llama-3.
|
||||
|
||||
If needed, you can also override the chat template when launching the server:
|
||||
@@ -384,9 +383,8 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
|
||||
- Llama
|
||||
- Mistral
|
||||
- Mixtral
|
||||
- Qwen / Qwen 2
|
||||
- Gemma
|
||||
- Please add a new flag `--attention-reduce-in-fp32` to avoid some precision errors.
|
||||
- Qwen / Qwen 2 / Qwen 2 MoE
|
||||
- Gemma / Gemma 2
|
||||
- `python -m sglang.launch_server --model-path google/gemma-7b-it --port 30000 --attention-reduce-in-fp32`
|
||||
- LLaVA
|
||||
- `python3 -m sglang.launch_server --model-path liuhaotian/llava-v1.5-7b --tokenizer-path llava-hf/llava-1.5-7b-hf --chat-template vicuna_v1.1 --port 30000`
|
||||
@@ -399,6 +397,8 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
|
||||
- StableLM
|
||||
- Command-R
|
||||
- DBRX
|
||||
- Grok
|
||||
- ChatGLM
|
||||
- AWQ/GPTQ/Marlin quantization
|
||||
|
||||
Instructions for supporting a new model are [here](https://github.com/sgl-project/sglang/blob/main/docs/model_support.md).
|
||||
|
||||
Reference in New Issue
Block a user