Sync from v0.13

2026-01-19 10:38:50 +08:00
parent b2ef04d792
commit 5aef6c175a
3714 changed files with 854317 additions and 89342 deletions
--- a/docs/deployment/nginx.md
+++ b/docs/deployment/nginx.md
@@ -0,0 +1,137 @@
+# Using Nginx
+
+This document shows how to launch multiple vLLM serving containers and use Nginx to act as a load balancer between the servers.
+
+## Build Nginx Container
+
+This guide assumes that you have just cloned the vLLM project and you're currently in the vllm root directory.
+
+```bash
+export vllm_root=`pwd`
+```
+
+Create a file named `Dockerfile.nginx`:
+
+```dockerfile
+FROM nginx:latest
+RUN rm /etc/nginx/conf.d/default.conf
+EXPOSE 80
+CMD ["nginx", "-g", "daemon off;"]
+```
+
+Build the container:
+
+```bash
+docker build . -f Dockerfile.nginx --tag nginx-lb
+```
+
+## Create Simple Nginx Config file
+
+Create a file named `nginx_conf/nginx.conf`. Note that you can add as many servers as you'd like. In the below example we'll start with two. To add more, add another `server vllmN:8000 max_fails=3 fail_timeout=10000s;` entry to `upstream backend`.
+
+??? console "Config"
+
+    ```console
+    upstream backend {
+        least_conn;
+        server vllm0:8000 max_fails=3 fail_timeout=10000s;
+        server vllm1:8000 max_fails=3 fail_timeout=10000s;
+    }
+    server {
+        listen 80;
+        location / {
+            proxy_pass http://backend;
+            proxy_set_header Host $host;
+            proxy_set_header X-Real-IP $remote_addr;
+            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+            proxy_set_header X-Forwarded-Proto $scheme;
+        }
+    }
+    ```
+
+## Build vLLM Container
+
+```bash
+cd $vllm_root
+docker build -f docker/Dockerfile . --tag vllm
+```
+
+If you are behind proxy, you can pass the proxy settings to the docker build command as shown below:
+
+```bash
+cd $vllm_root
+docker build \
+    -f docker/Dockerfile . \
+    --tag vllm \
+    --build-arg http_proxy=$http_proxy \
+    --build-arg https_proxy=$https_proxy
+```
+
+## Create Docker Network
+
+```bash
+docker network create vllm_nginx
+```
+
+## Launch vLLM Containers
+
+Notes:
+
+- If you have your HuggingFace models cached somewhere else, update `hf_cache_dir` below.
+- If you don't have an existing HuggingFace cache you will want to start `vllm0` and wait for the model to complete downloading and the server to be ready. This will ensure that `vllm1` can leverage the model you just downloaded and it won't have to be downloaded again.
+- The below example assumes GPU backend used. If you are using CPU backend, remove `--gpus device=ID`, add `VLLM_CPU_KVCACHE_SPACE` and `VLLM_CPU_OMP_THREADS_BIND` environment variables to the docker run command.
+- Adjust the model name that you want to use in your vLLM servers if you don't want to use `Llama-2-7b-chat-hf`.
+
+??? console "Commands"
+
+    ```console
+    mkdir -p ~/.cache/huggingface/hub/
+    hf_cache_dir=~/.cache/huggingface/
+    docker run \
+        -itd \
+        --ipc host \
+        --network vllm_nginx \
+        --gpus device=0 \
+        --shm-size=10.24gb \
+        -v $hf_cache_dir:/root/.cache/huggingface/ \
+        -p 8081:8000 \
+        --name vllm0 vllm \
+        --model meta-llama/Llama-2-7b-chat-hf
+    docker run \
+        -itd \
+        --ipc host \
+        --network vllm_nginx \
+        --gpus device=1 \
+        --shm-size=10.24gb \
+        -v $hf_cache_dir:/root/.cache/huggingface/ \
+        -p 8082:8000 \
+        --name vllm1 vllm \
+        --model meta-llama/Llama-2-7b-chat-hf
+    ```
+
+!!! note
+    If you are behind proxy, you can pass the proxy settings to the docker run command via `-e http_proxy=$http_proxy -e https_proxy=$https_proxy`.
+
+## Launch Nginx
+
+```bash
+docker run \
+    -itd \
+    -p 8000:80 \
+    --network vllm_nginx \
+    -v ./nginx_conf/:/etc/nginx/conf.d/ \
+    --name nginx-lb nginx-lb:latest
+```
+
+## Verify That vLLM Servers Are Ready
+
+```bash
+docker logs vllm0 | grep Uvicorn
+docker logs vllm1 | grep Uvicorn
+```
+
+Both outputs should look like this:
+
+```console
+INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
+```