[Docs] Improve documentations (#1368)

2024-09-09 20:48:28 -07:00
parent 743007e1ce
commit 8d1095dbf0
6 changed files with 475 additions and 125 deletions
--- a/docs/en/install.md
+++ b/docs/en/install.md
@@ -1,73 +1,56 @@
-# SGLang Installation Guide
+## Install SGLang

-SGLang consists of a frontend language (Structured Generation Language, SGLang) and a backend runtime (SGLang Runtime, SRT). The frontend can be used separately from the backend, allowing for a detached frontend-backend setup.
+You can install SGLang using any of the methods below.

-## Quick Installation Options
-
-### 1. Frontend Installation (Client-side, any platform)
-
-```bash
-pip install --upgrade pip
-pip install sglang
+### Method 1: With pip
 ```
-
-**Note: You can check [these examples](https://github.com/sgl-project/sglang/tree/main/examples/frontend_language/usage) for how to use frontend and backend separately.**
-
-### 2. Backend Installation (Server-side, Linux only)
-
-```bash
 pip install --upgrade pip
 pip install "sglang[all]"
+
+# Install FlashInfer CUDA kernels
 pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/
 ```

-**Note: The backend (SRT) is only needed on the server side and is only available for Linux right now.**
-
-**Important: Please check the [flashinfer installation guidance](https://docs.flashinfer.ai/installation.html) to install the proper version according to your PyTorch and CUDA versions.**
-
-### 3. From Source (Latest version, Linux only for full installation)
-
-```bash
-# Use the latest release branch
-# As of this documentation, it's v0.2.15, but newer versions may be available
-# Do not clone the main branch directly; always use a specific release version
-# The main branch may contain unresolved bugs before a new release
-git clone -b v0.2.15 https://github.com/sgl-project/sglang.git
+### Method 2: From source
+```
+# Use the last release branch
+git clone -b v0.3.0 https://github.com/sgl-project/sglang.git
 cd sglang
+
+pip install --upgrade pip
 pip install -e "python[all]"
+
+# Install FlashInfer CUDA kernels
 pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/
 ```

-### 4. OpenAI Backend Only (Client-side, any platform)
-
-If you only need to use the OpenAI backend, you can avoid installing other dependencies by using:
+### Method 3: Using docker
+The docker images are available on Docker Hub as [lmsysorg/sglang](https://hub.docker.com/r/lmsysorg/sglang/tags), built from [Dockerfile](https://github.com/sgl-project/sglang/tree/main/docker).
+Replace `<secret>` below with your huggingface hub [token](https://huggingface.co/docs/hub/en/security-tokens).

 ```bash
-pip install "sglang[openai]"
-```
-
-## Advanced Installation Options
-
-### 1. Using Docker (Server-side, Linux only)
-
-The docker images are available on Docker Hub as [lmsysorg/sglang](https://hub.docker.com/r/lmsysorg/sglang/tags), built from [Dockerfile](https://github.com/sgl-project/sglang/blob/main/docker). Replace `<secret>` below with your huggingface hub [token](https://huggingface.co/docs/hub/en/security-tokens).
-
-```bash
-docker run --gpus all -p 30000:30000 \
+docker run --gpus all \
+    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
-    --env "HF_TOKEN=<secret>" --ipc=host \
+    --env "HF_TOKEN=<secret>" \
+    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000
 ```

-### 2.Using docker compose
+### Method 4: Using docker compose

-This method is recommended if you plan to serve it as a service. A better approach is to use the [k8s-sglang-service.yaml](https://github.com/sgl-project/sglang/blob/main/docker/k8s-sglang-service.yaml).
+<details>
+<summary>More</summary>

-1. Copy the [compose.yml](https://github.com/sgl-project/sglang/blob/main/docker/compose.yaml) to your local machine
+> This method is recommended if you plan to serve it as a service.
+> A better approach is to use the [k8s-sglang-service.yaml](./docker/k8s-sglang-service.yaml).
+
+1. Copy the [compose.yml](./docker/compose.yaml) to your local machine
 2. Execute the command `docker compose up -d` in your terminal.
+</details>

-### 3.Run on Kubernetes or Clouds with SkyPilot
+### Method 5: Run on Kubernetes or Clouds with SkyPilot

 <details>
 <summary>More</summary>
@@ -108,9 +91,6 @@ sky status --endpoint 30000 sglang
 3. To further scale up your deployment with autoscaling and failure recovery, check out the [SkyServe + SGLang guide](https://github.com/skypilot-org/skypilot/tree/master/llm/sglang#serving-llama-2-with-sglang-for-more-traffic-using-skyserve).
 </details>

-## Troubleshooting
-
- For FlashInfer issues on newer GPUs, use `--disable-flashinfer --disable-flashinfer-sampling` when launching the server.
- For out-of-memory errors, try `--mem-fraction-static 0.7` when launching the server.
-
-For more details and advanced usage, visit the [SGLang GitHub repository](https://github.com/sgl-project/sglang).
+### Common Notes
+- [FlashInfer](https://github.com/flashinfer-ai/flashinfer) is currently one of the dependencies that must be installed for SGLang. It only supports sm75 and above. If you encounter any FlashInfer-related issues on sm75+ devices (e.g., T4, A10, A100, L4, L40S, H100), consider using Triton's kernel by `--disable-flashinfer --disable-flashinfer-sampling` and raise an issue.
+- If you only need to use the OpenAI backend, you can avoid installing other dependencies by using `pip install "sglang[openai]"`.