[Doc] Support kimi-k2-w8a8 (#2162)

### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: 9edd1db02b --------- Signed-off-by: wangli <wangli858794774@gmail.com>
2025-08-06 19:28:47 +08:00
parent 875a86cbe9
commit bf84f2dbfa
8 changed files with 194 additions and 40 deletions
--- a/docs/source/tutorials/multi_node.md
+++ b/docs/source/tutorials/multi_node.md
@@ -90,12 +90,12 @@ docker run --rm \
 -it $IMAGE bash
 ```

-:::{note}
-Before launch the inference server, ensure some environment variables are set for multi node communication
-:::
-
 Run the following scripts on two nodes respectively

+:::{note}
+Before launch the inference server, ensure the following environment variables are set for multi node communication
+:::
+
 **node0**

 ```shell
@@ -178,7 +178,7 @@ vllm serve /root/.cache/ds_v3 \
 ```

 The Deployment view looks like:
-![alt text](../assets/multi_node_dp.png)
+![alt text](../assets/multi_node_dp_deepseek.png)

 Once your server is started, you can query the model with input prompts: