sglang/docs/start/install.md

# Install SGLang

You can install SGLang using any of the methods below.

For running DeepSeek V3/R1, refer to [DeepSeek V3 Support](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3). It is recommended to use the latest version and deploy it with [Docker](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#using-docker-recommended) to avoid environment-related issues.

It is recommended to use uv to install the dependencies for faster installation:

## Method 1: With pip or uv

```bash
pip install --upgrade pip
pip install uv
uv pip install "sglang[all]>=0.4.5" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python
```

**Quick Fixes to Common Problems**

- SGLang currently uses torch 2.5, so you need to install flashinfer for torch 2.5. If you want to install flashinfer separately, please refer to [FlashInfer installation doc](https://docs.flashinfer.ai/installation.html). Please note that the FlashInfer pypi package is called `flashinfer-python` instead of `flashinfer`.

- If you encounter `OSError: CUDA_HOME environment variable is not set`. Please set it to your CUDA install root with either of the following solutions:
  1. Use `export CUDA_HOME=/usr/local/cuda-<your-cuda-version>` to set the `CUDA_HOME` environment variable.
  2. Install FlashInfer first following [FlashInfer installation doc](https://docs.flashinfer.ai/installation.html), then install SGLang as described above.

- If you encounter `ImportError; cannot import name 'is_valid_list_of_images' from 'transformers.models.llama.image_processing_llama'`, try to use the specified version of `transformers` in [pyproject.toml](https://github.com/sgl-project/sglang/blob/main/python/pyproject.toml). Currently, just running `pip install transformers==4.48.3`.

## Method 2: From source
```
# Use the last release branch
git clone -b v0.4.5 https://github.com/sgl-project/sglang.git
cd sglang

pip install --upgrade pip
pip install -e "python[all]" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python
```

Note: SGLang currently uses torch 2.5, so you need to install flashinfer for torch 2.5. If you want to install flashinfer separately, please refer to [FlashInfer installation doc](https://docs.flashinfer.ai/installation.html).

If you want to develop SGLang, it is recommended to use docker. Please refer to [setup docker container](https://github.com/sgl-project/sglang/blob/main/docs/developer/development_guide_using_docker.md#setup-docker-container) for guidance. The docker image is `lmsysorg/sglang:dev`.

Note: For AMD ROCm system with Instinct/MI GPUs, do following instead:

```
# Use the last release branch
git clone -b v0.4.5 https://github.com/sgl-project/sglang.git
cd sglang

pip install --upgrade pip
cd sgl-kernel
python setup_rocm.py install
cd ..
pip install -e "python[all_hip]"
```

## Method 3: Using docker
The docker images are available on Docker Hub as [lmsysorg/sglang](https://hub.docker.com/r/lmsysorg/sglang/tags), built from [Dockerfile](https://github.com/sgl-project/sglang/tree/main/docker).
Replace `<secret>` below with your huggingface hub [token](https://huggingface.co/docs/hub/en/security-tokens).

```bash
docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000
```

Note: For AMD ROCm system with Instinct/MI GPUs, it is recommended to use `docker/Dockerfile.rocm` to build images, example and usage as below:

```bash
docker build --build-arg SGL_BRANCH=v0.4.5 -t v0.4.5-rocm630 -f Dockerfile.rocm .

alias drun='docker run -it --rm --network=host --device=/dev/kfd --device=/dev/dri --ipc=host \
    --shm-size 16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
    -v $HOME/dockerx:/dockerx -v /data:/data'

drun -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    v0.4.5-rocm630 \
    python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000

# Till flashinfer backend available, --attention-backend triton --sampling-backend pytorch are set by default
drun v0.4.5-rocm630 python3 -m sglang.bench_one_batch --batch-size 32 --input 1024 --output 128 --model amd/Meta-Llama-3.1-8B-Instruct-FP8-KV --tp 8 --quantization fp8
```

## Method 4: Using docker compose

<details>
<summary>More</summary>

> This method is recommended if you plan to serve it as a service.
> A better approach is to use the [k8s-sglang-service.yaml](https://github.com/sgl-project/sglang/blob/main/docker/k8s-sglang-service.yaml).

1. Copy the [compose.yml](https://github.com/sgl-project/sglang/blob/main/docker/compose.yaml) to your local machine
2. Execute the command `docker compose up -d` in your terminal.
</details>

## Method 5: Using Kubernetes

<details>
<summary>More</summary>

1. Option 1: For single node serving (typically when the model size fits into GPUs on one node)
   Execute command `kubectl apply -f docker/k8s-sglang-service.yaml`, to create k8s deployment and service, with llama-31-8b as example.

2. Option 2: For multi-node serving (usually when a large model requires more than one GPU node, such as `DeepSeek-R1`)
   Modify the LLM model path and arguments as necessary, then execute command `kubectl apply -f docker/k8s-sglang-distributed-sts.yaml`, to create two nodes k8s statefulset and serving service.
</details>


## Method 6: Run on Kubernetes or Clouds with SkyPilot

<details>
<summary>More</summary>

To deploy on Kubernetes or 12+ clouds, you can use [SkyPilot](https://github.com/skypilot-org/skypilot).

1. Install SkyPilot and set up Kubernetes cluster or cloud access: see [SkyPilot's documentation](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html).
2. Deploy on your own infra with a single command and get the HTTP API endpoint:
<details>
<summary>SkyPilot YAML: <code>sglang.yaml</code></summary>

```yaml
# sglang.yaml
envs:
  HF_TOKEN: null

resources:
  image_id: docker:lmsysorg/sglang:latest
  accelerators: A100
  ports: 30000

run: |
  conda deactivate
  python3 -m sglang.launch_server \
    --model-path meta-llama/Llama-3.1-8B-Instruct \
    --host 0.0.0.0 \
    --port 30000
```
</details>

```bash
# Deploy on any cloud or Kubernetes cluster. Use --cloud <cloud> to select a specific cloud provider.
HF_TOKEN=<secret> sky launch -c sglang --env HF_TOKEN sglang.yaml

# Get the HTTP API endpoint
sky status --endpoint 30000 sglang
```
3. To further scale up your deployment with autoscaling and failure recovery, check out the [SkyServe + SGLang guide](https://github.com/skypilot-org/skypilot/tree/master/llm/sglang#serving-llama-2-with-sglang-for-more-traffic-using-skyserve).
</details>

## Common Notes
- [FlashInfer](https://github.com/flashinfer-ai/flashinfer) is the default attention kernel backend. It only supports sm75 and above. If you encounter any FlashInfer-related issues on sm75+ devices (e.g., T4, A10, A100, L4, L40S, H100), please switch to other kernels by adding `--attention-backend triton --sampling-backend pytorch` and open an issue on GitHub.
- If you only need to use OpenAI models with the frontend language, you can avoid installing other dependencies by using `pip install "sglang[openai]"`.
- The language frontend operates independently of the backend runtime. You can install the frontend locally without needing a GPU, while the backend can be set up on a GPU-enabled machine. To install the frontend, run `pip install sglang`, and for the backend, use `pip install sglang[srt]`. `srt` is the abbreviation of SGLang runtime.
- To reinstall flashinfer locally, use the following command: `pip install "flashinfer-python>=0.2.3" -i https://flashinfer.ai/whl/cu124/torch2.5 --force-reinstall --no-deps` and then delete the cache with `rm -rf ~/.cache/flashinfer`.
Fix warnings in doc build (#1852) 2024-10-30 22:28:00 -07:00			`# Install SGLang`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00
Fix CI and install docs (#3821) 2025-02-24 16:17:38 -08:00			`You can install SGLang using any of the methods below.`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00
Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00			`For running DeepSeek V3/R1, refer to [DeepSeek V3 Support](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3). It is recommended to use the latest version and deploy it with [Docker](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#using-docker-recommended) to avoid environment-related issues.`

			`It is recommended to use uv to install the dependencies for faster installation:`
[Docs]: Fix Multi-User Port Allocation Conflicts (#3601) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: simveit <simp.veitner@gmail.com> 2025-02-19 19:15:44 +00:00
Docs: delete sgl-kernel install in docs (#3845) 2025-02-26 18:25:43 +01:00			`## Method 1: With pip or uv`
[Docs]: Fix Multi-User Port Allocation Conflicts (#3601) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: simveit <simp.veitner@gmail.com> 2025-02-19 19:15:44 +00:00
			```bash
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00			`pip install --upgrade pip`
Docs: delete sgl-kernel install in docs (#3845) 2025-02-26 18:25:43 +01:00			`pip install uv`
bump v0.4.5 (#5117) 2025-04-07 00:35:00 -07:00			`uv pip install "sglang[all]>=0.4.5" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00			```

Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00			`Quick Fixes to Common Problems`
[Docs]: Fix Multi-User Port Allocation Conflicts (#3601) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: simveit <simp.veitner@gmail.com> 2025-02-19 19:15:44 +00:00
Fix CI and install docs (#3821) 2025-02-24 16:17:38 -08:00			- SGLang currently uses torch 2.5, so you need to install flashinfer for torch 2.5. If you want to install flashinfer separately, please refer to [FlashInfer installation doc](https://docs.flashinfer.ai/installation.html). Please note that the FlashInfer pypi package is called `flashinfer-python` instead of `flashinfer`.
[Docs]: Fix Multi-User Port Allocation Conflicts (#3601) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: simveit <simp.veitner@gmail.com> 2025-02-19 19:15:44 +00:00
Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00			- If you encounter `OSError: CUDA_HOME environment variable is not set`. Please set it to your CUDA install root with either of the following solutions:
			1. Use `export CUDA_HOME=/usr/local/cuda-<your-cuda-version>` to set the `CUDA_HOME` environment variable.
			`2. Install FlashInfer first following [FlashInfer installation doc](https://docs.flashinfer.ai/installation.html), then install SGLang as described above.`
Update install docs (#3553) Co-authored-by: Chayenne <zhaochen20@outlook.com> 2025-02-13 22:42:51 +01:00
[Docs]: Fix Multi-User Port Allocation Conflicts (#3601) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: simveit <simp.veitner@gmail.com> 2025-02-19 19:15:44 +00:00			- If you encounter `ImportError; cannot import name 'is_valid_list_of_images' from 'transformers.models.llama.image_processing_llama'`, try to use the specified version of `transformers` in [pyproject.toml](https://github.com/sgl-project/sglang/blob/main/python/pyproject.toml). Currently, just running `pip install transformers==4.48.3`.
Update install docs (#3553) Co-authored-by: Chayenne <zhaochen20@outlook.com> 2025-02-13 22:42:51 +01:00
Fix warnings in doc build (#1852) 2024-10-30 22:28:00 -07:00			`## Method 2: From source`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00			```
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00			`# Use the last release branch`
bump v0.4.5 (#5117) 2025-04-07 00:35:00 -07:00			`git clone -b v0.4.5 https://github.com/sgl-project/sglang.git`
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00			`cd sglang`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00			`pip install --upgrade pip`
update flashinfer-python (#3557) 2025-02-14 09:52:56 +08:00			`pip install -e "python[all]" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python`
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00			```
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00
Fix CI and install docs (#3821) 2025-02-24 16:17:38 -08:00			`Note: SGLang currently uses torch 2.5, so you need to install flashinfer for torch 2.5. If you want to install flashinfer separately, please refer to [FlashInfer installation doc](https://docs.flashinfer.ai/installation.html).`
docs: update install 2025-02-12 03:13:31 +08:00
Fix CI and install docs (#3821) 2025-02-24 16:17:38 -08:00			If you want to develop SGLang, it is recommended to use docker. Please refer to [setup docker container](https://github.com/sgl-project/sglang/blob/main/docs/developer/development_guide_using_docker.md#setup-docker-container) for guidance. The docker image is `lmsysorg/sglang:dev`.
Update docs (#1768) Co-authored-by: Chayenne Zhao <zhaochenyang20@gmail.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> 2024-10-23 11:28:48 -07:00
Fix CI and install docs (#3821) 2025-02-24 16:17:38 -08:00			`Note: For AMD ROCm system with Instinct/MI GPUs, do following instead:`
Update Install Method 2. From source (#2232) 2024-11-27 22:46:55 -08:00
			```
			`# Use the last release branch`
bump v0.4.5 (#5117) 2025-04-07 00:35:00 -07:00			`git clone -b v0.4.5 https://github.com/sgl-project/sglang.git`
Update Install Method 2. From source (#2232) 2024-11-27 22:46:55 -08:00			`cd sglang`

			`pip install --upgrade pip`
ROCm: sgl-kernel enablement starting with sgl_moe_align_block (#3287) 2025-02-04 05:44:44 -08:00			`cd sgl-kernel`
			`python setup_rocm.py install`
			`cd ..`
Update Install Method 2. From source (#2232) 2024-11-27 22:46:55 -08:00			`pip install -e "python[all_hip]"`
			```

Fix warnings in doc build (#1852) 2024-10-30 22:28:00 -07:00			`## Method 3: Using docker`
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00			`The docker images are available on Docker Hub as [lmsysorg/sglang](https://hub.docker.com/r/lmsysorg/sglang/tags), built from [Dockerfile](https://github.com/sgl-project/sglang/tree/main/docker).`
			Replace `<secret>` below with your huggingface hub [token](https://huggingface.co/docs/hub/en/security-tokens).
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00
			```bash
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00			`docker run --gpus all \`
docs: add shm size for docker run (#1986) 2024-11-10 22:14:48 +08:00			`--shm-size 32g \`
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00			`-p 30000:30000 \`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00			`-v ~/.cache/huggingface:/root/.cache/huggingface \`
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00			`--env "HF_TOKEN=<secret>" \`
			`--ipc=host \`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00			`lmsysorg/sglang:latest \`
[Fix] Fix all the Huggingface paths (#1553) 2024-10-02 10:12:07 -07:00			`python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00			```

Fix CI and install docs (#3821) 2025-02-24 16:17:38 -08:00			Note: For AMD ROCm system with Instinct/MI GPUs, it is recommended to use `docker/Dockerfile.rocm` to build images, example and usage as below:
[Docs, ROCm] update install to cover ROCm with MI GPUs (#1915) 2024-11-04 01:40:57 -08:00
			```bash
bump v0.4.5 (#5117) 2025-04-07 00:35:00 -07:00			`docker build --build-arg SGL_BRANCH=v0.4.5 -t v0.4.5-rocm630 -f Dockerfile.rocm .`
[Docs, ROCm] update install to cover ROCm with MI GPUs (#1915) 2024-11-04 01:40:57 -08:00
			`alias drun='docker run -it --rm --network=host --device=/dev/kfd --device=/dev/dri --ipc=host \`
			`--shm-size 16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \`
			`-v $HOME/dockerx:/dockerx -v /data:/data'`

			`drun -p 30000:30000 \`
			`-v ~/.cache/huggingface:/root/.cache/huggingface \`
			`--env "HF_TOKEN=<secret>" \`
bump v0.4.5 (#5117) 2025-04-07 00:35:00 -07:00			`v0.4.5-rocm630 \`
[Docs, ROCm] update install to cover ROCm with MI GPUs (#1915) 2024-11-04 01:40:57 -08:00			`python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000`

			`# Till flashinfer backend available, --attention-backend triton --sampling-backend pytorch are set by default`
bump v0.4.5 (#5117) 2025-04-07 00:35:00 -07:00			`drun v0.4.5-rocm630 python3 -m sglang.bench_one_batch --batch-size 32 --input 1024 --output 128 --model amd/Meta-Llama-3.1-8B-Instruct-FP8-KV --tp 8 --quantization fp8`
[Docs, ROCm] update install to cover ROCm with MI GPUs (#1915) 2024-11-04 01:40:57 -08:00			```

Fix warnings in doc build (#1852) 2024-10-30 22:28:00 -07:00			`## Method 4: Using docker compose`
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00
			`<details>`
			`<summary>More</summary>`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00			`> This method is recommended if you plan to serve it as a service.`
Update ci workflows (#1804) 2024-10-26 04:32:36 -07:00			`> A better approach is to use the [k8s-sglang-service.yaml](https://github.com/sgl-project/sglang/blob/main/docker/k8s-sglang-service.yaml).`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00
Update ci workflows (#1804) 2024-10-26 04:32:36 -07:00			`1. Copy the [compose.yml](https://github.com/sgl-project/sglang/blob/main/docker/compose.yaml) to your local machine`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00			2. Execute the command `docker compose up -d` in your terminal.
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00			`</details>`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00
[docker] Distributed Serving with k8s Statefulset ( good example for DeepSeek-R1) (#3631) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Co-authored-by: Kebe <kebe.liu@daocloud.io> 2025-03-09 15:41:20 +08:00			`## Method 5: Using Kubernetes`

			`<details>`
			`<summary>More</summary>`

			`1. Option 1: For single node serving (typically when the model size fits into GPUs on one node)`
			Execute command `kubectl apply -f docker/k8s-sglang-service.yaml`, to create k8s deployment and service, with llama-31-8b as example.

			2. Option 2: For multi-node serving (usually when a large model requires more than one GPU node, such as `DeepSeek-R1`)
			Modify the LLM model path and arguments as necessary, then execute command `kubectl apply -f docker/k8s-sglang-distributed-sts.yaml`, to create two nodes k8s statefulset and serving service.
			`</details>`



			`## Method 6: Run on Kubernetes or Clouds with SkyPilot`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00
			`<details>`
			`<summary>More</summary>`

			`To deploy on Kubernetes or 12+ clouds, you can use [SkyPilot](https://github.com/skypilot-org/skypilot).`

			`1. Install SkyPilot and set up Kubernetes cluster or cloud access: see [SkyPilot's documentation](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html).`
			`2. Deploy on your own infra with a single command and get the HTTP API endpoint:`
			`<details>`
			`<summary>SkyPilot YAML: <code>sglang.yaml</code></summary>`

			```yaml
			`# sglang.yaml`
			`envs:`
			`HF_TOKEN: null`

			`resources:`
			`image_id: docker:lmsysorg/sglang:latest`
			`accelerators: A100`
			`ports: 30000`

			`run: \|`
			`conda deactivate`
			`python3 -m sglang.launch_server \`
[Fix] Fix all the Huggingface paths (#1553) 2024-10-02 10:12:07 -07:00			`--model-path meta-llama/Llama-3.1-8B-Instruct \`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00			`--host 0.0.0.0 \`
			`--port 30000`
			```
			`</details>`

			```bash
			`# Deploy on any cloud or Kubernetes cluster. Use --cloud <cloud> to select a specific cloud provider.`
			`HF_TOKEN=<secret> sky launch -c sglang --env HF_TOKEN sglang.yaml`

			`# Get the HTTP API endpoint`
			`sky status --endpoint 30000 sglang`
			```
			`3. To further scale up your deployment with autoscaling and failure recovery, check out the [SkyServe + SGLang guide](https://github.com/skypilot-org/skypilot/tree/master/llm/sglang#serving-llama-2-with-sglang-for-more-traffic-using-skyserve).`
			`</details>`

Fix warnings in doc build (#1852) 2024-10-30 22:28:00 -07:00			`## Common Notes`
Release v0.3.1 (#1430) 2024-09-15 07:03:16 -07:00			- [FlashInfer](https://github.com/flashinfer-ai/flashinfer) is the default attention kernel backend. It only supports sm75 and above. If you encounter any FlashInfer-related issues on sm75+ devices (e.g., T4, A10, A100, L4, L40S, H100), please switch to other kernels by adding `--attention-backend triton --sampling-backend pytorch` and open an issue on GitHub.
Fix docs (#1889) 2024-11-02 11:46:00 -07:00			- If you only need to use OpenAI models with the frontend language, you can avoid installing other dependencies by using `pip install "sglang[openai]"`.
docs: update install (#3581) 2025-02-14 18:54:50 +08:00			- The language frontend operates independently of the backend runtime. You can install the frontend locally without needing a GPU, while the backend can be set up on a GPU-enabled machine. To install the frontend, run `pip install sglang`, and for the backend, use `pip install sglang[srt]`. `srt` is the abbreviation of SGLang runtime.
upgrade flashinfer 0.2.3 (#4317) Co-authored-by: qingquansong <qsong@linkedin.com> 2025-03-11 15:37:17 -07:00			- To reinstall flashinfer locally, use the following command: `pip install "flashinfer-python>=0.2.3" -i https://flashinfer.ai/whl/cu124/torch2.5 --force-reinstall --no-deps` and then delete the cache with `rm -rf ~/.cache/flashinfer`.