diff --git a/README.md b/README.md index 3b7d1ed6e..6ffa5e473 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ -------------------------------------------------------------------------------- -| [**Blog**](https://lmsys.org/blog/2024-01-17-sglang/) | [**Paper**](https://arxiv.org/abs/2312.07104) | +| [**Blog**](https://lmsys.org/blog/2024-07-25-sglang-llama3/) | [**Paper**](https://arxiv.org/abs/2312.07104) | SGLang is a fast serving framework for large language models and vision language models. It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language. @@ -57,7 +57,7 @@ pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.3/ ``` ### Method 3: Using docker -The docker images are available on Docker Hub as [lmsysorg/sglang](https://hub.docker.com/r/lmsysorg/sglang/tags). +The docker images are available on Docker Hub as [lmsysorg/sglang](https://hub.docker.com/r/lmsysorg/sglang/tags), built from [Dockerfile](docker). ```bash docker run --gpus all \ @@ -66,7 +66,7 @@ docker run --gpus all \ --env "HUGGING_FACE_HUB_TOKEN=" \ --ipc=host \ lmsysorg/sglang:latest \ - python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B --host 0.0.0.0 --port 30000 + python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --host 0.0.0.0 --port 30000 ``` ### Common Notes diff --git a/docker/Dockerfile b/docker/Dockerfile index abaf645c0..042cbc858 100644 --- a/docker/Dockerfile +++ b/docker/Dockerfile @@ -1,8 +1,7 @@ -ARG CUDA_VERSION=12.4.1 +ARG CUDA_VERSION=12.1.1 FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04 -ARG CUDA_VERSION=12.4.1 ARG PYTHON_VERSION=3 ENV DEBIAN_FRONTEND=noninteractive