Update readme

2024-07-25 08:14:36 -07:00
parent 1a491d00cb
commit 7802df1e2b
2 changed files with 4 additions and 5 deletions
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@

 --------------------------------------------------------------------------------

-| [**Blog**](https://lmsys.org/blog/2024-01-17-sglang/) | [**Paper**](https://arxiv.org/abs/2312.07104) |
+| [**Blog**](https://lmsys.org/blog/2024-07-25-sglang-llama3/) | [**Paper**](https://arxiv.org/abs/2312.07104) |

 SGLang is a fast serving framework for large language models and vision language models.
 It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.
@@ -57,7 +57,7 @@ pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.3/
 ```

 ### Method 3: Using docker
-The docker images are available on Docker Hub as [lmsysorg/sglang](https://hub.docker.com/r/lmsysorg/sglang/tags).
+The docker images are available on Docker Hub as [lmsysorg/sglang](https://hub.docker.com/r/lmsysorg/sglang/tags), built from [Dockerfile](docker).

 ```bash
 docker run --gpus all \
@@ -66,7 +66,7 @@ docker run --gpus all \
    --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
-    python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B --host 0.0.0.0 --port 30000
+    python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --host 0.0.0.0 --port 30000
 ```

 ### Common Notes
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -1,8 +1,7 @@
-ARG CUDA_VERSION=12.4.1
+ARG CUDA_VERSION=12.1.1

 FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04

-ARG CUDA_VERSION=12.4.1
 ARG PYTHON_VERSION=3

 ENV DEBIAN_FRONTEND=noninteractive