diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 21f9a2111..958c8b5ff 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -10,6 +10,6 @@ ## Checklist -- [ ] Format your code according to the [Contributor Guide](https://github.com/sgl-project/sglang/blob/main/docs/en/contributor_guide.md). -- [ ] Add unit tests as outlined in the [Contributor Guide](https://github.com/sgl-project/sglang/blob/main/docs/en/contributor_guide.md). +- [ ] Format your code according to the [Contributor Guide](https://github.com/sgl-project/sglang/blob/main/docs/contributor_guide.md). +- [ ] Add unit tests as outlined in the [Contributor Guide](https://github.com/sgl-project/sglang/blob/main/docs/contributor_guide.md). - [ ] Update documentation as needed, including docstrings or example tutorials. \ No newline at end of file diff --git a/.github/workflows/deploy-docs.yml b/.github/workflows/deploy-docs.yml index e229eae42..5b00ee578 100644 --- a/.github/workflows/deploy-docs.yml +++ b/.github/workflows/deploy-docs.yml @@ -27,7 +27,7 @@ jobs: - name: Execute notebooks run: | - cd docs/en + cd docs for nb in *.ipynb; do if [ -f "$nb" ]; then echo "Executing $nb" @@ -38,7 +38,7 @@ jobs: done build-and-deploy: - if: github.event_name == 'push' && github.ref == 'refs/heads/main' + if: github.repository == 'sgl-project/sglang' runs-on: 1-gpu-runner steps: - name: Checkout code @@ -58,21 +58,21 @@ jobs: - name: Build documentation run: | - cd docs/en + cd docs make html - name: Push to sgl-project.github.io env: GITHUB_TOKEN: ${{ secrets.PAT_TOKEN }} run: | - cd docs/en/_build/html + cd docs/_build/html git clone https://$GITHUB_TOKEN@github.com/sgl-project/sgl-project.github.io.git ../sgl-project.github.io cp -r * ../sgl-project.github.io cd ../sgl-project.github.io git config user.name "zhaochenyang20" git config user.email "zhaochenyang20@gmail.com" git add . - git commit -m "$(date +'%Y-%m-%d %H:%M:%S') - Update documentation" + git commit -m "Update $(date +'%Y-%m-%d %H:%M:%S')" git push https://$GITHUB_TOKEN@github.com/sgl-project/sgl-project.github.io.git main cd .. rm -rf sgl-project.github.io diff --git a/.gitignore b/.gitignore index dfb0b79a1..537b6918c 100644 --- a/.gitignore +++ b/.gitignore @@ -167,7 +167,7 @@ cython_debug/ *.swp # Documentation -docs/en/_build +docs/_build # SGL benchmark/mmlu/data @@ -185,7 +185,4 @@ tmp*.txt work_dirs/ *.csv -!logo.png - -# docs -/docs/en/_build \ No newline at end of file +!logo.png \ No newline at end of file diff --git a/.readthedocs.yaml b/.readthedocs.yaml deleted file mode 100644 index 94f52e9a0..000000000 --- a/.readthedocs.yaml +++ /dev/null @@ -1,17 +0,0 @@ -version: 2 - -formats: all - -build: - os: "ubuntu-22.04" - tools: - python: "3.12" - - -sphinx: - configuration: docs/en/conf.py - - -python: - install: - - requirements: docs/requirements.txt diff --git a/README.md b/README.md index 5f6746c69..2cc513ee5 100644 --- a/README.md +++ b/README.md @@ -171,7 +171,7 @@ curl http://localhost:30000/generate \ }' ``` -Learn more about the argument specification, streaming, and multi-modal support [here](docs/en/sampling_params.md). +Learn more about the argument specification, streaming, and multi-modal support [here](docs/sampling_params.md). ### OpenAI Compatible API In addition, the server supports OpenAI-compatible APIs. @@ -225,7 +225,7 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct ``` python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --mem-fraction-static 0.7 ``` -- See [hyperparameter_tuning.md](docs/en/hyperparameter_tuning.md) on tuning hyperparameters for better performance. +- See [hyperparameter_tuning.md](docs/hyperparameter_tuning.md) on tuning hyperparameters for better performance. - If you see out-of-memory errors during prefill for long prompts, try to set a smaller chunked prefill size. ``` python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --chunked-prefill-size 4096 @@ -235,7 +235,7 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct - To enable torchao quantization, add `--torchao-config int4wo-128`. It supports various quantization strategies. - To enable fp8 weight quantization, add `--quantization fp8` on a fp16 checkpoint or directly load a fp8 checkpoint without specifying any arguments. - To enable fp8 kv cache quantization, add `--kv-cache-dtype fp8_e5m2`. -- If the model does not have a chat template in the Hugging Face tokenizer, you can specify a [custom chat template](docs/en/custom_chat_template.md). +- If the model does not have a chat template in the Hugging Face tokenizer, you can specify a [custom chat template](docs/custom_chat_template.md). - To run tensor parallelism on multiple nodes, add `--nnodes 2`. If you have two nodes with two GPUs on each node and want to run TP=4, let `sgl-dev-0` be the hostname of the first node and `50000` be an available port, you can use the following commands. If you meet deadlock, please try to add `--disable-cuda-graph` ``` # Node 0 @@ -311,7 +311,7 @@ You can view the full example [here](https://github.com/sgl-project/sglang/tree/ - gte-Qwen2 - `python -m sglang.launch_server --model-path Alibaba-NLP/gte-Qwen2-7B-instruct --is-embedding` -Instructions for supporting a new model are [here](docs/en/model_support.md). +Instructions for supporting a new model are [here](docs/model_support.md). #### Use Models From ModelScope
diff --git a/docs/conf.py b/docs/conf.py index 86b467fad..e4be9f8b9 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -55,7 +55,6 @@ html_copy_source = True html_last_updated_fmt = "" html_theme_options = { - "path_to_docs": "docs/en", "repository_url": "https://github.com/sgl-project/sglang", "repository_branch": "main", "show_navbar_depth": 3, diff --git a/python/sglang/api.py b/python/sglang/api.py index 68524363e..28c6783a3 100644 --- a/python/sglang/api.py +++ b/python/sglang/api.py @@ -99,7 +99,7 @@ def gen( regex: Optional[str] = None, json_schema: Optional[str] = None, ): - """Call the model to generate. See the meaning of the arguments in docs/en/sampling_params.md""" + """Call the model to generate. See the meaning of the arguments in docs/sampling_params.md""" if choices: return SglSelect( diff --git a/python/sglang/lang/ir.py b/python/sglang/lang/ir.py index 5c03db068..8164478ed 100644 --- a/python/sglang/lang/ir.py +++ b/python/sglang/lang/ir.py @@ -445,7 +445,7 @@ class SglGen(SglExpr): regex: Optional[str] = None, json_schema: Optional[str] = None, ): - """Call the model to generate. See the meaning of the arguments in docs/en/sampling_params.md""" + """Call the model to generate. See the meaning of the arguments in docs/sampling_params.md""" super().__init__() self.name = name self.sampling_params = SglSamplingParams(