Docs: Fix layout with sub-section (#3710)
This commit is contained in:
@@ -39,4 +39,6 @@ compile:
|
||||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
|
||||
|
||||
clean:
|
||||
rm -rf $(BUILDDIR)/* logs/timing.log
|
||||
find . -name "*.ipynb" -exec nbstripout {} \;
|
||||
rm -rf $(BUILDDIR)
|
||||
rm -rf logs
|
||||
|
||||
@@ -20,19 +20,16 @@ Update your Jupyter notebooks in the appropriate subdirectories under `docs/`. I
|
||||
# 1) Compile all Jupyter notebooks
|
||||
make compile
|
||||
|
||||
# 2) Generate static HTML
|
||||
make html
|
||||
|
||||
# 3) Preview documentation locally
|
||||
# 2) Compile and Preview documentation locally
|
||||
# Open your browser at the displayed port to view the docs
|
||||
bash serve.sh
|
||||
|
||||
# 4) Clean notebook outputs
|
||||
# 3) Clean notebook outputs
|
||||
# nbstripout removes notebook outputs so your PR stays clean
|
||||
pip install nbstripout
|
||||
find . -name '*.ipynb' -exec nbstripout {} \;
|
||||
|
||||
# 5) Pre-commit checks and create a PR
|
||||
# 4) Pre-commit checks and create a PR
|
||||
# After these checks pass, push your changes and open a PR on your branch
|
||||
pre-commit run --all-files
|
||||
```
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Custom Chat Template in SGLang Runtime
|
||||
# Custom Chat Template
|
||||
|
||||
**NOTE**: There are two chat template systems in SGLang project. This document is about setting a custom chat template for the OpenAI-compatible API server (defined at [conversation.py](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/conversation.py)). It is NOT related to the chat template used in the SGLang language frontend (defined at [chat_template.py](https://github.com/sgl-project/sglang/blob/main/python/sglang/lang/chat_template.py)).
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Guide on Hyperparameter Tuning
|
||||
# Hyperparameter Tuning
|
||||
|
||||
## Achieving Peak Throughput
|
||||
Achieving a large batch size is the most important thing for attaining high throughput.
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Native APIs\n",
|
||||
"# SGLang Native APIs\n",
|
||||
"\n",
|
||||
"Apart from the OpenAI compatible APIs, the SGLang Runtime also provides its native server APIs. We introduce these following APIs:\n",
|
||||
"\n",
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Sampling Parameters in SGLang Runtime
|
||||
# Sampling Parameters
|
||||
|
||||
This doc describes the sampling parameters of the SGLang Runtime.
|
||||
It is the low-level endpoint of the runtime.
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Quick Start: Sending Requests\n",
|
||||
"# Sending Requests\n",
|
||||
"This notebook provides a quick-start guide to use SGLang in chat completions after installation.\n",
|
||||
"\n",
|
||||
"- For Vision Language Models, see [OpenAI APIs - Vision](../backend/openai_api_vision.ipynb).\n",
|
||||
@@ -16,16 +16,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Launch A Server\n",
|
||||
"\n",
|
||||
"This code block is equivalent to executing \n",
|
||||
"\n",
|
||||
"```bash\n",
|
||||
"python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \\\n",
|
||||
" --host 0.0.0.0\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"in your terminal and wait for the server to be ready. Once the server is running, you can send test requests using curl or requests. The server implements the [OpenAI-compatible APIs](https://platform.openai.com/docs/api-reference/chat)."
|
||||
"## Launch A Server"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -42,6 +33,9 @@
|
||||
"else:\n",
|
||||
" from sglang.utils import launch_server_cmd\n",
|
||||
"\n",
|
||||
"# This is equivalent to running the following command in your terminal\n",
|
||||
"\n",
|
||||
"# python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --host 0.0.0.0\n",
|
||||
"\n",
|
||||
"server_process, port = launch_server_cmd(\n",
|
||||
" \"\"\"\n",
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Frontend: Structured Generation Language (SGLang)
|
||||
# Structured Generation Language
|
||||
The frontend language can be used with local models or API models. It is an alternative to the OpenAI API. You may find it easier to use for complex prompting workflow.
|
||||
|
||||
## Quick Start
|
||||
|
||||
@@ -12,7 +12,7 @@ The core features include:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:caption: Getting Started
|
||||
:caption: Installation
|
||||
|
||||
start/install.md
|
||||
|
||||
@@ -26,10 +26,20 @@ The core features include:
|
||||
backend/openai_api_embeddings.ipynb
|
||||
backend/native_api.ipynb
|
||||
backend/offline_engine_api.ipynb
|
||||
backend/structured_outputs.ipynb
|
||||
backend/speculative_decoding.ipynb
|
||||
backend/function_calling.ipynb
|
||||
backend/server_arguments.md
|
||||
backend/sampling_params.md
|
||||
backend/hyperparameter_tuning.md
|
||||
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:caption: Advanced Features
|
||||
|
||||
backend/speculative_decoding.ipynb
|
||||
backend/structured_outputs.ipynb
|
||||
backend/function_calling.ipynb
|
||||
backend/custom_chat_template.md
|
||||
backend/quantization.md
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
@@ -44,48 +54,11 @@ The core features include:
|
||||
|
||||
router/router.md
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
General
|
||||
---------------------
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:maxdepth: 1
|
||||
:caption: References
|
||||
|
||||
references/supported_models.md
|
||||
references/contribution_guide.md
|
||||
references/troubleshooting.md
|
||||
references/faq.md
|
||||
references/learn_more.md
|
||||
|
||||
Hardware
|
||||
--------------------------
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
references/AMD.md
|
||||
references/amd_configure.md
|
||||
references/nvidia_jetson.md
|
||||
|
||||
Advanced Models & Deployment
|
||||
------------------------------
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
references/deepseek.md
|
||||
references/multi_node.md
|
||||
references/multi_node_inference_k8s_lws.md
|
||||
references/modelscope.md
|
||||
|
||||
Performance & Tuning
|
||||
--------------------
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
references/sampling_params.md
|
||||
references/hyperparameter_tuning.md
|
||||
references/benchmark_and_profiling.md
|
||||
references/accuracy_evaluation.md
|
||||
references/custom_chat_template.md
|
||||
references/quantization.md
|
||||
references/general
|
||||
references/hardware
|
||||
references/advanced_deploy
|
||||
references/performance_tuning
|
||||
|
||||
8
docs/references/advanced_deploy.rst
Normal file
8
docs/references/advanced_deploy.rst
Normal file
@@ -0,0 +1,8 @@
|
||||
Multi-Node Deployment
|
||||
==========================
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
deepseek.md
|
||||
multi_node.md
|
||||
k8s.md
|
||||
13
docs/references/general.rst
Normal file
13
docs/references/general.rst
Normal file
@@ -0,0 +1,13 @@
|
||||
|
||||
General Guidance
|
||||
==========
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
supported_models.md
|
||||
contribution_guide.md
|
||||
troubleshooting.md
|
||||
faq.md
|
||||
learn_more.md
|
||||
modelscope.md
|
||||
7
docs/references/hardware.rst
Normal file
7
docs/references/hardware.rst
Normal file
@@ -0,0 +1,7 @@
|
||||
Hardware Supports
|
||||
==========
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
amd.md
|
||||
nvidia_jetson.md
|
||||
@@ -1,4 +1,6 @@
|
||||
# Deploying a RoCE Network-Based SGLANG Two-Node Inference Service on a Kubernetes (K8S) Cluster
|
||||
# Kubernetes
|
||||
|
||||
This docs is for deploying a RoCE Network-Based SGLANG Two-Node Inference Service on a Kubernetes (K8S) Cluster.
|
||||
|
||||
LeaderWorkerSet (LWS) is a Kubernetes API that aims to address common deployment patterns of AI/ML inference workloads. A major use case is for multi-host/multi-node distributed inference.
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Run Multi-Node Inference
|
||||
# Multi-Node Deployment
|
||||
|
||||
## Llama 3.1 405B
|
||||
|
||||
|
||||
7
docs/references/performance_tuning.rst
Normal file
7
docs/references/performance_tuning.rst
Normal file
@@ -0,0 +1,7 @@
|
||||
Performance Tuning
|
||||
====================
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
benchmark_and_profiling.md
|
||||
accuracy_evaluation.md
|
||||
@@ -13,6 +13,7 @@ sphinx
|
||||
sphinx-book-theme
|
||||
sphinx-copybutton
|
||||
sphinx-tabs
|
||||
nbstripout
|
||||
sphinxcontrib-mermaid
|
||||
urllib3<2.0.0
|
||||
gguf>=0.10.0
|
||||
|
||||
@@ -1 +1,3 @@
|
||||
make clean
|
||||
make html
|
||||
python3 -m http.server --d _build/html
|
||||
|
||||
Reference in New Issue
Block a user