Docs: Fix layout with sub-section (#3710)

2025-02-19 15:44:30 -08:00
parent bb121214c2
commit 3c7bfd7eab
18 changed files with 78 additions and 72 deletions
--- a/docs/Makefile
+++ b/docs/Makefile
@@ -39,4 +39,6 @@ compile:
 	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

 clean:
-	rm -rf $(BUILDDIR)/* logs/timing.log
+	find . -name "*.ipynb" -exec nbstripout {} \;
+	rm -rf $(BUILDDIR)
+	rm -rf logs
--- a/docs/README.md
+++ b/docs/README.md
@@ -20,19 +20,16 @@ Update your Jupyter notebooks in the appropriate subdirectories under `docs/`. I
 # 1) Compile all Jupyter notebooks
 make compile

-# 2) Generate static HTML
-make html
-
-# 3) Preview documentation locally
+# 2) Compile and Preview documentation locally
 # Open your browser at the displayed port to view the docs
 bash serve.sh

-# 4) Clean notebook outputs
+# 3) Clean notebook outputs
 # nbstripout removes notebook outputs so your PR stays clean
 pip install nbstripout
 find . -name '*.ipynb' -exec nbstripout {} \;

-# 5) Pre-commit checks and create a PR
+# 4) Pre-commit checks and create a PR
 # After these checks pass, push your changes and open a PR on your branch
 pre-commit run --all-files
 ```
--- a/docs/references/custom_chat_template.md
+++ b/docs/references/custom_chat_template.md
@@ -1,4 +1,4 @@
-# Custom Chat Template in SGLang Runtime
+# Custom Chat Template

 **NOTE**: There are two chat template systems in SGLang project. This document is about setting a custom chat template for the OpenAI-compatible API server (defined at [conversation.py](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/conversation.py)). It is NOT related to the chat template used in the SGLang language frontend (defined at [chat_template.py](https://github.com/sgl-project/sglang/blob/main/python/sglang/lang/chat_template.py)).

--- a/docs/references/hyperparameter_tuning.md
+++ b/docs/references/hyperparameter_tuning.md
@@ -1,4 +1,4 @@
-# Guide on Hyperparameter Tuning
+# Hyperparameter Tuning

 ## Achieving Peak Throughput
 Achieving a large batch size is the most important thing for attaining high throughput.
--- a/docs/backend/native_api.ipynb
+++ b/docs/backend/native_api.ipynb
@@ -4,7 +4,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Native APIs\n",
+    "# SGLang Native APIs\n",
    "\n",
    "Apart from the OpenAI compatible APIs, the SGLang Runtime also provides its native server APIs. We introduce these following APIs:\n",
    "\n",
--- a/docs/references/quantization.md
+++ b/docs/references/quantization.md
--- a/docs/references/sampling_params.md
+++ b/docs/references/sampling_params.md
@@ -1,4 +1,4 @@
-# Sampling Parameters in SGLang Runtime
+# Sampling Parameters

 This doc describes the sampling parameters of the SGLang Runtime.
 It is the low-level endpoint of the runtime.
--- a/docs/backend/send_request.ipynb
+++ b/docs/backend/send_request.ipynb
@@ -4,7 +4,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Quick Start: Sending Requests\n",
+    "# Sending Requests\n",
    "This notebook provides a quick-start guide to use SGLang in chat completions after installation.\n",
    "\n",
    "- For Vision Language Models, see [OpenAI APIs - Vision](../backend/openai_api_vision.ipynb).\n",
@@ -16,16 +16,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Launch A Server\n",
-    "\n",
-    "This code block is equivalent to executing \n",
-    "\n",
-    "```bash\n",
-    "python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \\\n",
-    " --host 0.0.0.0\n",
-    "```\n",
-    "\n",
-    "in your terminal and wait for the server to be ready. Once the server is running, you can send test requests using curl or requests. The server implements the [OpenAI-compatible APIs](https://platform.openai.com/docs/api-reference/chat)."
+    "## Launch A Server"
   ]
  },
  {
@@ -42,6 +33,9 @@
    "else:\n",
    "    from sglang.utils import launch_server_cmd\n",
    "\n",
+    "# This is equivalent to running the following command in your terminal\n",
+    "\n",
+    "# python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --host 0.0.0.0\n",
    "\n",
    "server_process, port = launch_server_cmd(\n",
    "    \"\"\"\n",
--- a/docs/frontend/frontend.md
+++ b/docs/frontend/frontend.md
@@ -1,4 +1,4 @@
-# Frontend: Structured Generation Language (SGLang)
+# Structured Generation Language
 The frontend language can be used with local models or API models. It is an alternative to the OpenAI API. You may find it easier to use for complex prompting workflow.

 ## Quick Start
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -12,7 +12,7 @@ The core features include:

 .. toctree::
   :maxdepth: 1
-   :caption: Getting Started
+   :caption: Installation

   start/install.md

@@ -26,10 +26,20 @@ The core features include:
   backend/openai_api_embeddings.ipynb
   backend/native_api.ipynb
   backend/offline_engine_api.ipynb
-   backend/structured_outputs.ipynb
-   backend/speculative_decoding.ipynb
-   backend/function_calling.ipynb
   backend/server_arguments.md
+   backend/sampling_params.md
+   backend/hyperparameter_tuning.md
+
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Advanced Features
+
+   backend/speculative_decoding.ipynb
+   backend/structured_outputs.ipynb
+   backend/function_calling.ipynb
+   backend/custom_chat_template.md
+   backend/quantization.md

 .. toctree::
   :maxdepth: 1
@@ -44,48 +54,11 @@ The core features include:

   router/router.md

-
-References
-==========
-
-General
---------------------
 .. toctree::
-   :maxdepth: 1
+      :maxdepth: 1
+      :caption: References

-   references/supported_models.md
-   references/contribution_guide.md
-   references/troubleshooting.md
-   references/faq.md
-   references/learn_more.md
-
-Hardware
--------------------------
-.. toctree::
-   :maxdepth: 1
-
-   references/AMD.md
-   references/amd_configure.md
-   references/nvidia_jetson.md
-
-Advanced Models & Deployment
------------------------------
-.. toctree::
-   :maxdepth: 1
-
-   references/deepseek.md
-   references/multi_node.md
-   references/multi_node_inference_k8s_lws.md
-   references/modelscope.md
-
-Performance & Tuning
--------------------
-.. toctree::
-   :maxdepth: 1
-
-   references/sampling_params.md
-   references/hyperparameter_tuning.md
-   references/benchmark_and_profiling.md
-   references/accuracy_evaluation.md
-   references/custom_chat_template.md
-   references/quantization.md
+      references/general
+      references/hardware
+      references/advanced_deploy
+      references/performance_tuning
--- a/docs/references/advanced_deploy.rst
+++ b/docs/references/advanced_deploy.rst
@@ -0,0 +1,8 @@
+Multi-Node Deployment
+==========================
+.. toctree::
+   :maxdepth: 1
+
+   deepseek.md
+   multi_node.md
+   k8s.md
--- a/docs/references/general.rst
+++ b/docs/references/general.rst
@@ -0,0 +1,13 @@
+
+General Guidance
+==========
+
+.. toctree::
+   :maxdepth: 1
+
+   supported_models.md
+   contribution_guide.md
+   troubleshooting.md
+   faq.md
+   learn_more.md
+   modelscope.md
--- a/docs/references/hardware.rst
+++ b/docs/references/hardware.rst
@@ -0,0 +1,7 @@
+Hardware Supports
+==========
+.. toctree::
+   :maxdepth: 1
+
+   amd.md
+   nvidia_jetson.md
--- a/docs/references/multi_node_inference_k8s_lws.md
+++ b/docs/references/multi_node_inference_k8s_lws.md
@@ -1,4 +1,6 @@
-# Deploying a RoCE Network-Based SGLANG Two-Node Inference Service on a Kubernetes (K8S) Cluster
+# Kubernetes
+
+This docs is for deploying a RoCE Network-Based SGLANG Two-Node Inference Service on a Kubernetes (K8S) Cluster.

 LeaderWorkerSet (LWS) is a Kubernetes API that aims to address common deployment patterns of AI/ML inference workloads. A major use case is for multi-host/multi-node distributed inference.

--- a/docs/references/multi_node.md
+++ b/docs/references/multi_node.md
@@ -1,4 +1,4 @@
-# Run Multi-Node Inference
+# Multi-Node Deployment

 ## Llama 3.1 405B

--- a/docs/references/performance_tuning.rst
+++ b/docs/references/performance_tuning.rst
@@ -0,0 +1,7 @@
+Performance Tuning
+====================
+.. toctree::
+   :maxdepth: 1
+
+   benchmark_and_profiling.md
+   accuracy_evaluation.md
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -13,6 +13,7 @@ sphinx
 sphinx-book-theme
 sphinx-copybutton
 sphinx-tabs
+nbstripout
 sphinxcontrib-mermaid
 urllib3<2.0.0
 gguf>=0.10.0
--- a/docs/serve.sh
+++ b/docs/serve.sh
@@ -1 +1,3 @@
+make clean
+make html
 python3 -m http.server --d _build/html