Fix logprob_start_len for multi modal models (#2597)

Co-authored-by: libra <lihu723@gmail.com>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: Wang, Haoyu <haoyu.wang@intel.com>
This commit is contained in:
Lianmin Zheng
2024-12-26 06:27:45 -08:00
committed by GitHub
parent 637de9e8ce
commit 773951548d
4 changed files with 10 additions and 9 deletions

View File

@@ -223,7 +223,7 @@
"## Structured decoding (JSON, Regex)\n",
"You can define a JSON schema or regular expression to constrain the model's output. The model output will be guaranteed to follow the given constraints and this depends on the grammar backend.\n",
"\n",
"SGlang has two backends: outlines (default) and Xgrammar. Xgrammar enhances JSON decoding performance but does not support regular expressions. To use Xgrammar, add the `--grammar-backend xgrammar` when launching the server:\n",
"SGlang has two backends: [Outlines](https://github.com/dottxt-ai/outlines) (default) and [XGrammar](https://blog.mlc.ai/2024/11/22/achieving-efficient-flexible-portable-structured-generation-with-xgrammar). Xgrammar accelerates JSON decoding performance but does not support regular expressions. To use Xgrammar, add the `--grammar-backend xgrammar` when launching the server:\n",
"\n",
"```bash\n",
"python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \\\n",

View File

@@ -1,8 +1,7 @@
# Sampling Parameters in SGLang Runtime
This doc describes the sampling parameters of the SGLang Runtime.
It is the low-level endpoint of the runtime.
If you want a high-level endpoint that can automatically handle chat templates, consider using the [OpenAI Compatible API
](https://github.com/sgl-project/sglang?tab=readme-ov-file#openai-compatible-api).
If you want a high-level endpoint that can automatically handle chat templates, consider using the [OpenAI Compatible API](../backend/openai_api_completions.ipynb).
The `/generate` endpoint accepts the following arguments in the JSON format.