Adding Documentation for installation (#1300)
Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com>
This commit is contained in:
@@ -1,4 +1,4 @@
|
|||||||
Welcome to SGLang's tutorials!
|
Welcome to SGLang!
|
||||||
====================================
|
====================================
|
||||||
|
|
||||||
.. figure:: ./_static/image/logo.png
|
.. figure:: ./_static/image/logo.png
|
||||||
@@ -27,9 +27,22 @@ SGLang has the following core features:
|
|||||||
|
|
||||||
* **Flexible Frontend Language**: Enables easy programming of LLM applications with chained generation calls, advanced prompting, control flow, multiple modalities, parallelism, and external interactions.
|
* **Flexible Frontend Language**: Enables easy programming of LLM applications with chained generation calls, advanced prompting, control flow, multiple modalities, parallelism, and external interactions.
|
||||||
|
|
||||||
|
* **Extensive Model Support**: SGLang supports a wide range of generative models including the Llama series (up to Llama 3.1), Mistral, Gemma, Qwen, DeepSeek, LLaVA, Yi-VL, StableLM, Command-R, DBRX, Grok, ChatGLM, InternLM 2 and Exaone 3. It also supports embedding models such as e5-mistral and gte-Qwen2. Easily extensible to support new models.
|
||||||
|
|
||||||
|
* **Open Source Community**: SGLang is an open source project with a vibrant community of contributors. We welcome contributions from anyone interested in advancing the state of the art in LLM and VLM serving.
|
||||||
|
|
||||||
Documentation
|
Documentation
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
|
.. In this documentation, we'll dive into these following areas to help you get the most out of SGLang.
|
||||||
|
|
||||||
|
.. _installation:
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
:caption: Installation
|
||||||
|
|
||||||
|
install.md
|
||||||
|
|
||||||
.. _hyperparameter_tuning:
|
.. _hyperparameter_tuning:
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
@@ -58,7 +71,10 @@ Documentation
|
|||||||
|
|
||||||
sampling_params.md
|
sampling_params.md
|
||||||
|
|
||||||
Search Bar
|
|
||||||
==================
|
|
||||||
|
|
||||||
* :ref:`search`
|
.. _benchmark_and_profilling:
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
:caption: Benchmark and Profilling
|
||||||
|
|
||||||
|
benchmark_and_profiling.md
|
||||||
116
docs/en/install.md
Normal file
116
docs/en/install.md
Normal file
@@ -0,0 +1,116 @@
|
|||||||
|
# SGLang Installation Guide
|
||||||
|
|
||||||
|
SGLang consists of a frontend language (Structured Generation Language, SGLang) and a backend runtime (SGLang Runtime, SRT). The frontend can be used separately from the backend, allowing for a detached frontend-backend setup.
|
||||||
|
|
||||||
|
## Quick Installation Options
|
||||||
|
|
||||||
|
### 1. Frontend Installation (Client-side, any platform)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install --upgrade pip
|
||||||
|
pip install sglang
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note: You can check [these examples](https://github.com/sgl-project/sglang/tree/main/examples/frontend_language/usage) for how to use frontend and backend separately.**
|
||||||
|
|
||||||
|
### 2. Backend Installation (Server-side, Linux only)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install --upgrade pip
|
||||||
|
pip install "sglang[all]"
|
||||||
|
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note: The backend (SRT) is only needed on the server side and is only available for Linux right now.**
|
||||||
|
|
||||||
|
**Important: Please check the [flashinfer installation guidance](https://docs.flashinfer.ai/installation.html) to install the proper version according to your PyTorch and CUDA versions.**
|
||||||
|
|
||||||
|
### 3. From Source (Latest version, Linux only for full installation)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Use the latest release branch
|
||||||
|
# As of this documentation, it's v0.2.15, but newer versions may be available
|
||||||
|
# Do not clone the main branch directly; always use a specific release version
|
||||||
|
# The main branch may contain unresolved bugs before a new release
|
||||||
|
git clone -b v0.2.15 https://github.com/sgl-project/sglang.git
|
||||||
|
cd sglang
|
||||||
|
pip install -e "python[all]"
|
||||||
|
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. OpenAI Backend Only (Client-side, any platform)
|
||||||
|
|
||||||
|
If you only need to use the OpenAI backend, you can avoid installing other dependencies by using:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install "sglang[openai]"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Advanced Installation Options
|
||||||
|
|
||||||
|
### 1. Using Docker (Server-side, Linux only)
|
||||||
|
|
||||||
|
The docker images are available on Docker Hub as [lmsysorg/sglang](https://hub.docker.com/r/lmsysorg/sglang/tags), built from [Dockerfile](https://github.com/sgl-project/sglang/blob/main/docker). Replace `<secret>` below with your huggingface hub [token](https://huggingface.co/docs/hub/en/security-tokens).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run --gpus all -p 30000:30000 \
|
||||||
|
-v ~/.cache/huggingface:/root/.cache/huggingface \
|
||||||
|
--env "HF_TOKEN=<secret>" --ipc=host \
|
||||||
|
lmsysorg/sglang:latest \
|
||||||
|
python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2.Using docker compose
|
||||||
|
|
||||||
|
This method is recommended if you plan to serve it as a service. A better approach is to use the [k8s-sglang-service.yaml](https://github.com/sgl-project/sglang/blob/main/docker/k8s-sglang-service.yaml).
|
||||||
|
|
||||||
|
1. Copy the [compose.yml](https://github.com/sgl-project/sglang/blob/main/docker/compose.yaml) to your local machine
|
||||||
|
2. Execute the command `docker compose up -d` in your terminal.
|
||||||
|
|
||||||
|
### 3.Run on Kubernetes or Clouds with SkyPilot
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>More</summary>
|
||||||
|
|
||||||
|
To deploy on Kubernetes or 12+ clouds, you can use [SkyPilot](https://github.com/skypilot-org/skypilot).
|
||||||
|
|
||||||
|
1. Install SkyPilot and set up Kubernetes cluster or cloud access: see [SkyPilot's documentation](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html).
|
||||||
|
2. Deploy on your own infra with a single command and get the HTTP API endpoint:
|
||||||
|
<details>
|
||||||
|
<summary>SkyPilot YAML: <code>sglang.yaml</code></summary>
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# sglang.yaml
|
||||||
|
envs:
|
||||||
|
HF_TOKEN: null
|
||||||
|
|
||||||
|
resources:
|
||||||
|
image_id: docker:lmsysorg/sglang:latest
|
||||||
|
accelerators: A100
|
||||||
|
ports: 30000
|
||||||
|
|
||||||
|
run: |
|
||||||
|
conda deactivate
|
||||||
|
python3 -m sglang.launch_server \
|
||||||
|
--model-path meta-llama/Meta-Llama-3.1-8B-Instruct \
|
||||||
|
--host 0.0.0.0 \
|
||||||
|
--port 30000
|
||||||
|
```
|
||||||
|
</details>
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Deploy on any cloud or Kubernetes cluster. Use --cloud <cloud> to select a specific cloud provider.
|
||||||
|
HF_TOKEN=<secret> sky launch -c sglang --env HF_TOKEN sglang.yaml
|
||||||
|
|
||||||
|
# Get the HTTP API endpoint
|
||||||
|
sky status --endpoint 30000 sglang
|
||||||
|
```
|
||||||
|
3. To further scale up your deployment with autoscaling and failure recovery, check out the [SkyServe + SGLang guide](https://github.com/skypilot-org/skypilot/tree/master/llm/sglang#serving-llama-2-with-sglang-for-more-traffic-using-skyserve).
|
||||||
|
</details>
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
- For FlashInfer issues on newer GPUs, use `--disable-flashinfer --disable-flashinfer-sampling` when launching the server.
|
||||||
|
- For out-of-memory errors, try `--mem-fraction-static 0.7` when launching the server.
|
||||||
|
|
||||||
|
For more details and advanced usage, visit the [SGLang GitHub repository](https://github.com/sgl-project/sglang).
|
||||||
@@ -7,6 +7,4 @@ sphinx-tabs
|
|||||||
sphinxcontrib-mermaid
|
sphinxcontrib-mermaid
|
||||||
pillow
|
pillow
|
||||||
pydantic
|
pydantic
|
||||||
torch
|
|
||||||
transformers
|
|
||||||
urllib3<2.0.0
|
urllib3<2.0.0
|
||||||
|
|||||||
Reference in New Issue
Block a user