Update readme
This commit is contained in:
@@ -4,7 +4,7 @@
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
| [**Blog**](https://lmsys.org/blog/2024-01-17-sglang/) | [**Paper**](https://arxiv.org/abs/2312.07104) |
|
||||
| [**Blog**](https://lmsys.org/blog/2024-07-25-sglang-llama3/) | [**Paper**](https://arxiv.org/abs/2312.07104) |
|
||||
|
||||
SGLang is a fast serving framework for large language models and vision language models.
|
||||
It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.
|
||||
@@ -57,7 +57,7 @@ pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.3/
|
||||
```
|
||||
|
||||
### Method 3: Using docker
|
||||
The docker images are available on Docker Hub as [lmsysorg/sglang](https://hub.docker.com/r/lmsysorg/sglang/tags).
|
||||
The docker images are available on Docker Hub as [lmsysorg/sglang](https://hub.docker.com/r/lmsysorg/sglang/tags), built from [Dockerfile](docker).
|
||||
|
||||
```bash
|
||||
docker run --gpus all \
|
||||
@@ -66,7 +66,7 @@ docker run --gpus all \
|
||||
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \
|
||||
--ipc=host \
|
||||
lmsysorg/sglang:latest \
|
||||
python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B --host 0.0.0.0 --port 30000
|
||||
python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --host 0.0.0.0 --port 30000
|
||||
```
|
||||
|
||||
### Common Notes
|
||||
|
||||
@@ -1,8 +1,7 @@
|
||||
ARG CUDA_VERSION=12.4.1
|
||||
ARG CUDA_VERSION=12.1.1
|
||||
|
||||
FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04
|
||||
|
||||
ARG CUDA_VERSION=12.4.1
|
||||
ARG PYTHON_VERSION=3
|
||||
|
||||
ENV DEBIAN_FRONTEND=noninteractive
|
||||
|
||||
Reference in New Issue
Block a user