Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
This commit is contained in:
@@ -6,7 +6,7 @@
|
||||
"source": [
|
||||
"# Tool and Function Calling\n",
|
||||
"\n",
|
||||
"This guide demonstrates how to use SGLang’s [Funcion calling](https://platform.openai.com/docs/guides/function-calling) functionality."
|
||||
"This guide demonstrates how to use SGLang’s [Function calling](https://platform.openai.com/docs/guides/function-calling) functionality."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -399,7 +399,7 @@
|
||||
" },\n",
|
||||
"}\n",
|
||||
"gen_response = requests.post(gen_url, json=gen_data).json()[\"text\"]\n",
|
||||
"print_highlight(\"==== Reponse ====\")\n",
|
||||
"print_highlight(\"==== Response ====\")\n",
|
||||
"print(gen_response)\n",
|
||||
"\n",
|
||||
"# parse the response\n",
|
||||
|
||||
@@ -275,7 +275,7 @@
|
||||
"source": [
|
||||
"## Structured Outputs (JSON, Regex, EBNF)\n",
|
||||
"\n",
|
||||
"For OpenAI compatible structed outputs API, refer to [Structured Outputs](https://docs.sglang.ai/backend/structured_outputs.html#OpenAI-Compatible-API) for more details.\n"
|
||||
"For OpenAI compatible structured outputs API, refer to [Structured Outputs](https://docs.sglang.ai/backend/structured_outputs.html#OpenAI-Compatible-API) for more details.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -40,7 +40,7 @@ The `/generate` endpoint accepts the following parameters in JSON format. For de
|
||||
| Argument | Type/Default | Description |
|
||||
|--------------------|------------------------|------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| frequency_penalty | `float = 0.0` | Penalizes tokens based on their frequency in generation so far. Must be between `-2` and `2` where negative numbers encourage repeatment of tokens and positive number encourages sampling of new tokens. The scaling of penalization grows linearly with each appearance of a token. |
|
||||
| presence_penalty | `float = 0.0` | Penalizes tokens if they appeared in the generation so far. Must be between `-2` and `2` where negative numbers encourage repeatment of tokens and positive number encourages sampling of new tokens. The scaling of the penalization is constant if a token occured. |
|
||||
| presence_penalty | `float = 0.0` | Penalizes tokens if they appeared in the generation so far. Must be between `-2` and `2` where negative numbers encourage repeatment of tokens and positive number encourages sampling of new tokens. The scaling of the penalization is constant if a token occurred. |
|
||||
| min_new_tokens | `int = 0` | Forces the model to generate at least `min_new_tokens` until a stop word or EOS token is sampled. Note that this might lead to unintended behavior, for example, if the distribution is highly skewed towards these tokens. |
|
||||
|
||||
### Constrained decoding
|
||||
|
||||
@@ -166,7 +166,7 @@
|
||||
"source": [
|
||||
"## Using Native Generation APIs\n",
|
||||
"\n",
|
||||
"You can also use the native `/generate` endpoint with requests, which provides more flexiblity. An API reference is available at [Sampling Parameters](sampling_params.md)."
|
||||
"You can also use the native `/generate` endpoint with requests, which provides more flexibility. An API reference is available at [Sampling Parameters](sampling_params.md)."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -378,7 +378,7 @@
|
||||
"\n",
|
||||
" Args:\n",
|
||||
" model_type (str): Type of model to parse reasoning from\n",
|
||||
" stream_reasoning (bool): If Flase, accumulates reasoning content until complete.\n",
|
||||
" stream_reasoning (bool): If False, accumulates reasoning content until complete.\n",
|
||||
" If True, streams reasoning content as it arrives.\n",
|
||||
" \"\"\"\n",
|
||||
"\n",
|
||||
|
||||
@@ -11,7 +11,7 @@
|
||||
"\n",
|
||||
"### Performance Highlights\n",
|
||||
"\n",
|
||||
"Please see below for the huge improvements on throughput for LLaMA-Instruct 3.1 8B tested on MT bench that can be archieved via EAGLE3 decoding.\n",
|
||||
"Please see below for the huge improvements on throughput for LLaMA-Instruct 3.1 8B tested on MT bench that can be achieved via EAGLE3 decoding.\n",
|
||||
"For further details please see the [EAGLE3 paper](https://arxiv.org/pdf/2503.01840).\n",
|
||||
"\n",
|
||||
"| Method | Throughput (tokens/s) |\n",
|
||||
@@ -296,7 +296,7 @@
|
||||
"- EAGLE-2 additionally uses the draft model to evaluate how probable certain branches in the draft tree are, dynamically stopping the expansion of unlikely branches. After the expansion phase, reranking is employed to select only the top `speculative_num_draft_tokens` final nodes as draft tokens.\n",
|
||||
"- EAGLE-3 removes the feature prediction objective, incorporates low and mid-layer features, and is trained in an on-policy manner.\n",
|
||||
"\n",
|
||||
"This enhances drafting accuracy by operating on the features instead of tokens for more regular inputs and passing the tokens from the next timestep additionaly to minimize randomness effects from sampling. Furthermore the dynamic adjustment of the draft tree and selection of reranked final nodes increases acceptance rate of draft tokens further. For more details see [EAGLE-2](https://arxiv.org/abs/2406.16858) and [EAGLE-3](https://arxiv.org/abs/2503.01840) paper.\n",
|
||||
"This enhances drafting accuracy by operating on the features instead of tokens for more regular inputs and passing the tokens from the next timestep additionally to minimize randomness effects from sampling. Furthermore the dynamic adjustment of the draft tree and selection of reranked final nodes increases acceptance rate of draft tokens further. For more details see [EAGLE-2](https://arxiv.org/abs/2406.16858) and [EAGLE-3](https://arxiv.org/abs/2503.01840) paper.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"For guidance how to train your own EAGLE model please see the [EAGLE repo](https://github.com/SafeAILab/EAGLE/tree/main?tab=readme-ov-file#train)."
|
||||
|
||||
Reference in New Issue
Block a user