Add typo checker in pre-commit (#6179)

Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
This commit is contained in:
applesaucethebun
2025-05-11 00:55:00 -04:00
committed by GitHub
parent de167cf5fa
commit 2ce8793519
99 changed files with 154 additions and 144 deletions

View File

@@ -6,7 +6,7 @@
"source": [
"# Tool and Function Calling\n",
"\n",
"This guide demonstrates how to use SGLangs [Funcion calling](https://platform.openai.com/docs/guides/function-calling) functionality."
"This guide demonstrates how to use SGLangs [Function calling](https://platform.openai.com/docs/guides/function-calling) functionality."
]
},
{
@@ -399,7 +399,7 @@
" },\n",
"}\n",
"gen_response = requests.post(gen_url, json=gen_data).json()[\"text\"]\n",
"print_highlight(\"==== Reponse ====\")\n",
"print_highlight(\"==== Response ====\")\n",
"print(gen_response)\n",
"\n",
"# parse the response\n",

View File

@@ -275,7 +275,7 @@
"source": [
"## Structured Outputs (JSON, Regex, EBNF)\n",
"\n",
"For OpenAI compatible structed outputs API, refer to [Structured Outputs](https://docs.sglang.ai/backend/structured_outputs.html#OpenAI-Compatible-API) for more details.\n"
"For OpenAI compatible structured outputs API, refer to [Structured Outputs](https://docs.sglang.ai/backend/structured_outputs.html#OpenAI-Compatible-API) for more details.\n"
]
},
{

View File

@@ -40,7 +40,7 @@ The `/generate` endpoint accepts the following parameters in JSON format. For de
| Argument | Type/Default | Description |
|--------------------|------------------------|------------------------------------------------------------------------------------------------------------------------------------------------|
| frequency_penalty | `float = 0.0` | Penalizes tokens based on their frequency in generation so far. Must be between `-2` and `2` where negative numbers encourage repeatment of tokens and positive number encourages sampling of new tokens. The scaling of penalization grows linearly with each appearance of a token. |
| presence_penalty | `float = 0.0` | Penalizes tokens if they appeared in the generation so far. Must be between `-2` and `2` where negative numbers encourage repeatment of tokens and positive number encourages sampling of new tokens. The scaling of the penalization is constant if a token occured. |
| presence_penalty | `float = 0.0` | Penalizes tokens if they appeared in the generation so far. Must be between `-2` and `2` where negative numbers encourage repeatment of tokens and positive number encourages sampling of new tokens. The scaling of the penalization is constant if a token occurred. |
| min_new_tokens | `int = 0` | Forces the model to generate at least `min_new_tokens` until a stop word or EOS token is sampled. Note that this might lead to unintended behavior, for example, if the distribution is highly skewed towards these tokens. |
### Constrained decoding

View File

@@ -166,7 +166,7 @@
"source": [
"## Using Native Generation APIs\n",
"\n",
"You can also use the native `/generate` endpoint with requests, which provides more flexiblity. An API reference is available at [Sampling Parameters](sampling_params.md)."
"You can also use the native `/generate` endpoint with requests, which provides more flexibility. An API reference is available at [Sampling Parameters](sampling_params.md)."
]
},
{

View File

@@ -378,7 +378,7 @@
"\n",
" Args:\n",
" model_type (str): Type of model to parse reasoning from\n",
" stream_reasoning (bool): If Flase, accumulates reasoning content until complete.\n",
" stream_reasoning (bool): If False, accumulates reasoning content until complete.\n",
" If True, streams reasoning content as it arrives.\n",
" \"\"\"\n",
"\n",

View File

@@ -11,7 +11,7 @@
"\n",
"### Performance Highlights\n",
"\n",
"Please see below for the huge improvements on throughput for LLaMA-Instruct 3.1 8B tested on MT bench that can be archieved via EAGLE3 decoding.\n",
"Please see below for the huge improvements on throughput for LLaMA-Instruct 3.1 8B tested on MT bench that can be achieved via EAGLE3 decoding.\n",
"For further details please see the [EAGLE3 paper](https://arxiv.org/pdf/2503.01840).\n",
"\n",
"| Method | Throughput (tokens/s) |\n",
@@ -296,7 +296,7 @@
"- EAGLE-2 additionally uses the draft model to evaluate how probable certain branches in the draft tree are, dynamically stopping the expansion of unlikely branches. After the expansion phase, reranking is employed to select only the top `speculative_num_draft_tokens` final nodes as draft tokens.\n",
"- EAGLE-3 removes the feature prediction objective, incorporates low and mid-layer features, and is trained in an on-policy manner.\n",
"\n",
"This enhances drafting accuracy by operating on the features instead of tokens for more regular inputs and passing the tokens from the next timestep additionaly to minimize randomness effects from sampling. Furthermore the dynamic adjustment of the draft tree and selection of reranked final nodes increases acceptance rate of draft tokens further. For more details see [EAGLE-2](https://arxiv.org/abs/2406.16858) and [EAGLE-3](https://arxiv.org/abs/2503.01840) paper.\n",
"This enhances drafting accuracy by operating on the features instead of tokens for more regular inputs and passing the tokens from the next timestep additionally to minimize randomness effects from sampling. Furthermore the dynamic adjustment of the draft tree and selection of reranked final nodes increases acceptance rate of draft tokens further. For more details see [EAGLE-2](https://arxiv.org/abs/2406.16858) and [EAGLE-3](https://arxiv.org/abs/2503.01840) paper.\n",
"\n",
"\n",
"For guidance how to train your own EAGLE model please see the [EAGLE repo](https://github.com/SafeAILab/EAGLE/tree/main?tab=readme-ov-file#train)."

View File

@@ -52,7 +52,7 @@ docker run -itd --shm-size 32g --gpus all -v <volumes-to-mount> --ipc=host --net
docker exec -it sglang_dev /bin/zsh
```
Some useful volumes to mount are:
1. **Huggingface model cache**: mounting model cache can avoid re-download everytime docker restarts. Default location on Linux is `~/.cache/huggingface/`.
1. **Huggingface model cache**: mounting model cache can avoid re-download every time docker restarts. Default location on Linux is `~/.cache/huggingface/`.
2. **SGLang repository**: code changes in the SGLang local repository will be automatically synced to the .devcontainer.
Example 1: Monting local cache folder `/opt/dlami/nvme/.cache` but not the SGLang repo. Use this when you prefer to manually transfer local code changes to the devcontainer.

View File

@@ -29,7 +29,7 @@ Then follow https://github.com/sgl-project/sglang/settings/actions/runners/new?a
**Notes**
- Do not need to specify the runner group
- Give it a name (e.g., `test-sgl-gpu-0`) and some labels (e.g., `1-gpu-runner`). The labels can be editted later in Github Settings.
- Give it a name (e.g., `test-sgl-gpu-0`) and some labels (e.g., `1-gpu-runner`). The labels can be edited later in Github Settings.
- Do not need to change the work folder.
### Step 3: Run the runner by `run.sh`

View File

@@ -32,7 +32,7 @@ python -m sglang_router.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-
After the server is ready, you can directly send requests to the router as the same way as sending requests to each single worker.
Please adjust the batchsize accordingly to archieve maximum throughput.
Please adjust the batchsize accordingly to achieve maximum throughput.
```python
import requests