[Docs] Add EBNF to sampling params docs (#2609)

This commit is contained in:
Adarsh Shirawalmath
2024-12-29 13:35:00 +05:30
committed by GitHub
parent 8ee9a8501a
commit fd34f2da35
2 changed files with 100 additions and 22 deletions

View File

@@ -220,14 +220,21 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Structured decoding (JSON, Regex)\n", "## Structured Outputs (JSON, Regex, EBNF)\n",
"You can define a JSON schema or regular expression to constrain the model's output. The model output will be guaranteed to follow the given constraints and this depends on the grammar backend.\n", "You can specify a JSON schema, Regular Expression or [EBNF](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form) to constrain the model output. The model output will be guaranteed to follow the given constraints. \n",
"\n", "\n",
"SGlang has two backends: [Outlines](https://github.com/dottxt-ai/outlines) (default) and [XGrammar](https://blog.mlc.ai/2024/11/22/achieving-efficient-flexible-portable-structured-generation-with-xgrammar). Xgrammar accelerates JSON decoding performance but does not support regular expressions. To use Xgrammar, add the `--grammar-backend xgrammar` when launching the server:\n", "SGLang supports two grammar backends:\n",
"\n", "\n",
"- [Outlines](https://github.com/dottxt-ai/outlines) (default): Supports JSON schema and Regular Expression constraints.\n",
"- [XGrammar](https://github.com/mlc-ai/xgrammar): Supports JSON schema and EBNF constraints.\n",
" - XGrammar currently uses the [GGML BNF format](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md)\n",
"\n",
"> 🔔 Only one constraint parameter (`json_schema`, `regex`, or `ebnf`) can be specified at a time.\n",
"\n",
"Initialise xgrammar backend using `--grammar-backend xgrammar` flag\n",
"```bash\n", "```bash\n",
"python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \\\n", "python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \\\n",
"--port 30000 --host 0.0.0.0 --grammar-backend xgrammar\n", "--port 30000 --host 0.0.0.0 --grammar-backend [xgrammar|outlines] # xgrammar or outlines (default: outlines)\n",
"```\n", "```\n",
"\n", "\n",
"### JSON" "### JSON"
@@ -275,7 +282,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Regular expression" "### Regular expression (use default \"outlines\" backend)"
] ]
}, },
{ {
@@ -297,6 +304,46 @@
"print_highlight(response.choices[0].message.content)" "print_highlight(response.choices[0].message.content)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### EBNF (use \"xgrammar\" backend)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# terminate the existing server(that's using default outlines backend) for this demo\n",
"terminate_process(server_process)\n",
"\n",
"# start new server with xgrammar backend\n",
"server_process = execute_shell_command(\n",
" \"python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --port 30000 --host 0.0.0.0 --grammar-backend xgrammar\"\n",
")\n",
"wait_for_server(\"http://localhost:30000\")\n",
"\n",
"# EBNF example\n",
"ebnf_grammar = r\"\"\"\n",
" root ::= \"Hello\" | \"Hi\" | \"Hey\"\n",
" \"\"\"\n",
"response = client.chat.completions.create(\n",
" model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": \"You are a helpful EBNF test bot.\"},\n",
" {\"role\": \"user\", \"content\": \"Say a greeting.\"},\n",
" ],\n",
" temperature=0,\n",
" max_tokens=32,\n",
" extra_body={\"ebnf\": ebnf_grammar},\n",
")\n",
"\n",
"print_highlight(response.choices[0].message.content)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -58,13 +58,18 @@ ignore_eos: bool = False,
skip_special_tokens: bool = True, skip_special_tokens: bool = True,
# Whether to add spaces between special tokens during detokenization. # Whether to add spaces between special tokens during detokenization.
spaces_between_special_tokens: bool = True, spaces_between_special_tokens: bool = True,
# Constrains the output to follow a given regular expression.
regex: Optional[str] = None,
# Do parallel sampling and return `n` outputs. # Do parallel sampling and return `n` outputs.
n: int = 1, n: int = 1,
## Structured Outputs
# Only one of the below three can be set at a time:
# Constrains the output to follow a given regular expression.
regex: Optional[str] = None,
# Constrains the output to follow a given JSON schema. # Constrains the output to follow a given JSON schema.
# `regex` and `json_schema` cannot be set at the same time.
json_schema: Optional[str] = None, json_schema: Optional[str] = None,
# Constrains the output to follow a given EBNF Grammar.
ebnf: Optional[str] = None,
## Penalties. See [Performance Implications on Penalties] section below for more informations. ## Penalties. See [Performance Implications on Penalties] section below for more informations.
@@ -179,25 +184,37 @@ print(response.json())
The `image_data` can be a file name, a URL, or a base64 encoded string. See also `python/sglang/srt/utils.py:load_image`. The `image_data` can be a file name, a URL, or a base64 encoded string. See also `python/sglang/srt/utils.py:load_image`.
Streaming is supported in a similar manner as [above](#streaming). Streaming is supported in a similar manner as [above](#streaming).
### Structured decoding (JSON, Regex) ### Structured Outputs (JSON, Regex, EBNF)
You can specify a JSON schema or a regular expression to constrain the model output. The model output will be guaranteed to follow the given constraints. You can specify a JSON schema, Regular Expression or [EBNF](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form) to constrain the model output. The model output will be guaranteed to follow the given constraints.
SGLang supports two grammar backends:
- [Outlines](https://github.com/dottxt-ai/outlines) (default): Supports JSON schema and Regular Expression constraints.
- [XGrammar](https://github.com/mlc-ai/xgrammar): Supports JSON schema and EBNF constraints.
- XGrammar currently uses the [GGML BNF format](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md)
> 🔔 Only one constraint parameter (`json_schema`, `regex`, or `ebnf`) can be specified at a time.
Initialise xgrammar backend using `--grammar-backend xgrammar` flag
```bash
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \
--port 30000 --host 0.0.0.0 --grammar-backend [xgrammar|outlines] # xgrammar or outlines (default: outlines)
```
```python ```python
import json import json
import requests import requests
json_schema = json.dumps( json_schema = json.dumps({
{ "type": "object",
"type": "object", "properties": {
"properties": { "name": {"type": "string", "pattern": "^[\\w]+$"},
"name": {"type": "string", "pattern": "^[\\w]+$"}, "population": {"type": "integer"},
"population": {"type": "integer"}, },
}, "required": ["name", "population"],
"required": ["name", "population"], })
}
)
# JSON # JSON (works with both Outlines and XGrammar)
response = requests.post( response = requests.post(
"http://localhost:30000/generate", "http://localhost:30000/generate",
json={ json={
@@ -211,7 +228,7 @@ response = requests.post(
) )
print(response.json()) print(response.json())
# Regular expression # Regular expression (Outlines backend only)
response = requests.post( response = requests.post(
"http://localhost:30000/generate", "http://localhost:30000/generate",
json={ json={
@@ -224,4 +241,18 @@ response = requests.post(
}, },
) )
print(response.json()) print(response.json())
# EBNF (XGrammar backend only)
response = requests.post(
"http://localhost:30000/generate",
json={
"text": "Write a greeting.",
"sampling_params": {
"temperature": 0,
"max_new_tokens": 64,
"ebnf": 'root ::= "Hello" | "Hi" | "Hey"',
},
},
)
print(response.json())
``` ```