xc-llm-ascend/docs/source/tutorials/multi_node_kimi.md

# Multi-Node-DP (Kimi-K2)

## Verify Multi-Node Communication Environment

Refer to [multi_node.md](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_node.html#verification-process).

## Run with Docker
Assume you have two Atlas 800 A3 (64G*16)  or four A2 nodes, and want to deploy the `Kimi-K2-Instruct-W8A8` quantitative model across multiple nodes.

```{code-block} bash
   :substitutions:
# Update the vllm-ascend image
export IMAGE=m.daocloud.io/quay.io/ascend/vllm-ascend:|vllm_ascend_version|
export NAME=vllm-ascend

# Run the container using the defined variables
# Note: If you are running bridge network with docker, please expose available ports for multiple nodes communication in advance
docker run --rm \
--name $NAME \
--net=host \
--device /dev/davinci0 \
--device /dev/davinci1 \
--device /dev/davinci2 \
--device /dev/davinci3 \
--device /dev/davinci4 \
--device /dev/davinci5 \
--device /dev/davinci6 \
--device /dev/davinci7 \
--device /dev/davinci8 \
--device /dev/davinci9 \
--device /dev/davinci10 \
--device /dev/davinci11 \
--device /dev/davinci12 \
--device /dev/davinci13 \
--device /dev/davinci14 \
--device /dev/davinci15 \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /mnt/sfs_turbo/.cache:/home/cache \
-it $IMAGE bash
```

Run the following scripts on two nodes respectively.

:::{note}
Before launching the inference server, ensure the following environment variables are set for multi-node communication.
:::

**Node 0**

```shell
#!/bin/sh

# this obtained through ifconfig
# nic_name is the network interface name corresponding to local_ip of the current node
nic_name="xxxx"
local_ip="xxxx"

export HCCL_IF_IP=$local_ip
export GLOO_SOCKET_IFNAME=$nic_name
export TP_SOCKET_IFNAME=$nic_name
export HCCL_SOCKET_IFNAME=$nic_name
export OMP_PROC_BIND=false
export OMP_NUM_THREADS=10
export VLLM_USE_V1=1
export HCCL_BUFFSIZE=1024

# The w8a8 weight can be obtained from https://www.modelscope.cn/models/vllm-ascend/Kimi-K2-Instruct-W8A8
# If you want to do the quantization manually, please refer to https://vllm-ascend.readthedocs.io/en/latest/user_guide/feature_guide/quantization.html
vllm serve /home/cache/weights/Kimi-K2-Instruct-W8A8 \
--host 0.0.0.0 \
--port 8004 \
--data-parallel-size 4 \
--api-server-count 2 \
--data-parallel-size-local 2 \
--data-parallel-address $local_ip \
--data-parallel-rpc-port 13389 \
--seed 1024 \
--served-model-name kimi \
--quantization ascend \
--tensor-parallel-size 8 \
--enable-expert-parallel \
--max-num-seqs 16 \
--max-model-len 32768 \
--max-num-batched-tokens 4096 \
--trust-remote-code \
--no-enable-prefix-caching \
--gpu-memory-utilization 0.9 \
--additional-config '{"ascend_scheduler_config":{"enabled":true},"torchair_graph_config":{"enabled":true}}'
```

**Node 1**

```shell
#!/bin/sh

# this obtained through ifconfig
# nic_name is the network interface name corresponding to local_ip of the current node
nic_name="xxxx"
local_ip="xxxx"

# The value of node0_ip must be consistent with the value of local_ip set in node0 (master node)
node0_ip="xxxx"

export HCCL_IF_IP=$local_ip
export GLOO_SOCKET_IFNAME=$nic_name
export TP_SOCKET_IFNAME=$nic_name
export HCCL_SOCKET_IFNAME=$nic_name
export OMP_PROC_BIND=false
export OMP_NUM_THREADS=10
export VLLM_USE_V1=1
export HCCL_BUFFSIZE=1024

vllm serve /home/cache/weights/Kimi-K2-Instruct-W8A8 \
--host 0.0.0.0 \
--port 8004 \
--headless \
--data-parallel-size 4 \
--data-parallel-size-local 2 \
--data-parallel-start-rank 2 \
--data-parallel-address $node0_ip \
--data-parallel-rpc-port 13389 \
--seed 1024 \
--tensor-parallel-size 8 \
--served-model-name kimi \
--max-num-seqs 16 \
--max-model-len 32768 \
--quantization ascend \
--max-num-batched-tokens 4096 \
--enable-expert-parallel \
--trust-remote-code \
--no-enable-prefix-caching \
--gpu-memory-utilization 0.92 \
--additional-config '{"ascend_scheduler_config":{"enabled":true},"torchair_graph_config":{"enabled":true}}'
```

The deployment view looks like:
![alt text](../assets/multi_node_dp_kimi.png)

Once your server is started, you can query the model with input prompts:

```shell
curl http://{ node0 ip:8004 }/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "kimi",
        "prompt": "The future of AI is",
        "max_tokens": 50,
        "temperature": 0
    }'
```
[Doc] Support kimi-k2-w8a8 (#2162) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9edd1db02bc6dce6da503503a373657f3466a78b --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-08-06 19:28:47 +08:00			`# Multi-Node-DP (Kimi-K2)`

			`## Verify Multi-Node Communication Environment`

[v0.11.0][Doc] Update doc (#3852) ### What this PR does / why we need it? Update doc Signed-off-by: hfadzxy <starmoon_zhang@163.com> 2025-10-29 11:32:12 +08:00			`Refer to [multi_node.md](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_node.html#verification-process).`
[Doc] Support kimi-k2-w8a8 (#2162) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9edd1db02bc6dce6da503503a373657f3466a78b --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-08-06 19:28:47 +08:00
[v0.11.0][Doc] Update doc (#3852) ### What this PR does / why we need it? Update doc Signed-off-by: hfadzxy <starmoon_zhang@163.com> 2025-10-29 11:32:12 +08:00			`## Run with Docker`
			Assume you have two Atlas 800 A3 (64G*16) or four A2 nodes, and want to deploy the `Kimi-K2-Instruct-W8A8` quantitative model across multiple nodes.
[Doc] Support kimi-k2-w8a8 (#2162) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9edd1db02bc6dce6da503503a373657f3466a78b --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-08-06 19:28:47 +08:00
			```{code-block} bash
			`:substitutions:`
			`# Update the vllm-ascend image`
			`export IMAGE=m.daocloud.io/quay.io/ascend/vllm-ascend:\|vllm_ascend_version\|`
			`export NAME=vllm-ascend`

			`# Run the container using the defined variables`
[v0.11.0][Doc] Update doc (#3852) ### What this PR does / why we need it? Update doc Signed-off-by: hfadzxy <starmoon_zhang@163.com> 2025-10-29 11:32:12 +08:00			`# Note: If you are running bridge network with docker, please expose available ports for multiple nodes communication in advance`
[Doc] Support kimi-k2-w8a8 (#2162) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9edd1db02bc6dce6da503503a373657f3466a78b --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-08-06 19:28:47 +08:00			`docker run --rm \`
			`--name $NAME \`
			`--net=host \`
			`--device /dev/davinci0 \`
			`--device /dev/davinci1 \`
			`--device /dev/davinci2 \`
			`--device /dev/davinci3 \`
			`--device /dev/davinci4 \`
			`--device /dev/davinci5 \`
			`--device /dev/davinci6 \`
			`--device /dev/davinci7 \`
			`--device /dev/davinci8 \`
			`--device /dev/davinci9 \`
			`--device /dev/davinci10 \`
			`--device /dev/davinci11 \`
			`--device /dev/davinci12 \`
			`--device /dev/davinci13 \`
			`--device /dev/davinci14 \`
			`--device /dev/davinci15 \`
			`--device /dev/davinci_manager \`
			`--device /dev/devmm_svm \`
			`--device /dev/hisi_hdc \`
			`-v /usr/local/dcmi:/usr/local/dcmi \`
			`-v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \`
			`-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \`
			`-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \`
			`-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \`
			`-v /etc/ascend_install.info:/etc/ascend_install.info \`
			`-v /mnt/sfs_turbo/.cache:/home/cache \`
			`-it $IMAGE bash`
			```

[v0.11.0][Doc] Update doc (#3852) ### What this PR does / why we need it? Update doc Signed-off-by: hfadzxy <starmoon_zhang@163.com> 2025-10-29 11:32:12 +08:00			`Run the following scripts on two nodes respectively.`
[Doc] Support kimi-k2-w8a8 (#2162) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9edd1db02bc6dce6da503503a373657f3466a78b --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-08-06 19:28:47 +08:00
			`:::{note}`
[v0.11.0][Doc] Update doc (#3852) ### What this PR does / why we need it? Update doc Signed-off-by: hfadzxy <starmoon_zhang@163.com> 2025-10-29 11:32:12 +08:00			`Before launching the inference server, ensure the following environment variables are set for multi-node communication.`
[Doc] Support kimi-k2-w8a8 (#2162) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9edd1db02bc6dce6da503503a373657f3466a78b --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-08-06 19:28:47 +08:00			`:::`

[v0.11.0][Doc] Update doc (#3852) ### What this PR does / why we need it? Update doc Signed-off-by: hfadzxy <starmoon_zhang@163.com> 2025-10-29 11:32:12 +08:00			`Node 0`
[Doc] Support kimi-k2-w8a8 (#2162) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9edd1db02bc6dce6da503503a373657f3466a78b --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-08-06 19:28:47 +08:00
			```shell
			`#!/bin/sh`

			`# this obtained through ifconfig`
[Doc] Clearer corresponding relationship between configurations for multi-node guides (#3441) Optimize multi-node guide: more clearer corresponding relationship between configuration items and nodes ### What this PR does / why we need it? Some issues caused by misunderstandings due to unclear guidance content, for example: #3367 ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: leo-pony <nengjunma@outlook.com> 2025-10-16 08:54:03 +08:00			`# nic_name is the network interface name corresponding to local_ip of the current node`
[Doc] Support kimi-k2-w8a8 (#2162) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9edd1db02bc6dce6da503503a373657f3466a78b --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-08-06 19:28:47 +08:00			`nic_name="xxxx"`
			`local_ip="xxxx"`

			`export HCCL_IF_IP=$local_ip`
			`export GLOO_SOCKET_IFNAME=$nic_name`
			`export TP_SOCKET_IFNAME=$nic_name`
			`export HCCL_SOCKET_IFNAME=$nic_name`
			`export OMP_PROC_BIND=false`
[0.11.0]fix the configuration conflicts in documentation (#4824) ### What this PR does / why we need it? Fix configuration errors in our documentation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? NA. Signed-off-by: linfeng-yuan <1102311262@qq.com> 2025-12-09 15:37:06 +08:00			`export OMP_NUM_THREADS=10`
[Doc] Support kimi-k2-w8a8 (#2162) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9edd1db02bc6dce6da503503a373657f3466a78b --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-08-06 19:28:47 +08:00			`export VLLM_USE_V1=1`
			`export HCCL_BUFFSIZE=1024`

[v0.11.0][Doc] Update doc (#3852) ### What this PR does / why we need it? Update doc Signed-off-by: hfadzxy <starmoon_zhang@163.com> 2025-10-29 11:32:12 +08:00			`# The w8a8 weight can be obtained from https://www.modelscope.cn/models/vllm-ascend/Kimi-K2-Instruct-W8A8`
			`# If you want to do the quantization manually, please refer to https://vllm-ascend.readthedocs.io/en/latest/user_guide/feature_guide/quantization.html`
[Doc] Support kimi-k2-w8a8 (#2162) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9edd1db02bc6dce6da503503a373657f3466a78b --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-08-06 19:28:47 +08:00			`vllm serve /home/cache/weights/Kimi-K2-Instruct-W8A8 \`
			`--host 0.0.0.0 \`
			`--port 8004 \`
			`--data-parallel-size 4 \`
			`--api-server-count 2 \`
			`--data-parallel-size-local 2 \`
			`--data-parallel-address $local_ip \`
			`--data-parallel-rpc-port 13389 \`
			`--seed 1024 \`
			`--served-model-name kimi \`
			`--quantization ascend \`
			`--tensor-parallel-size 8 \`
			`--enable-expert-parallel \`
			`--max-num-seqs 16 \`
			`--max-model-len 32768 \`
			`--max-num-batched-tokens 4096 \`
			`--trust-remote-code \`
			`--no-enable-prefix-caching \`
			`--gpu-memory-utilization 0.9 \`
			`--additional-config '{"ascend_scheduler_config":{"enabled":true},"torchair_graph_config":{"enabled":true}}'`
			```

[v0.11.0][Doc] Update doc (#3852) ### What this PR does / why we need it? Update doc Signed-off-by: hfadzxy <starmoon_zhang@163.com> 2025-10-29 11:32:12 +08:00			`Node 1`
[Doc] Support kimi-k2-w8a8 (#2162) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9edd1db02bc6dce6da503503a373657f3466a78b --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-08-06 19:28:47 +08:00
			```shell
			`#!/bin/sh`

[Doc] Clearer corresponding relationship between configurations for multi-node guides (#3441) Optimize multi-node guide: more clearer corresponding relationship between configuration items and nodes ### What this PR does / why we need it? Some issues caused by misunderstandings due to unclear guidance content, for example: #3367 ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: leo-pony <nengjunma@outlook.com> 2025-10-16 08:54:03 +08:00			`# this obtained through ifconfig`
			`# nic_name is the network interface name corresponding to local_ip of the current node`
[Doc] Support kimi-k2-w8a8 (#2162) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9edd1db02bc6dce6da503503a373657f3466a78b --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-08-06 19:28:47 +08:00			`nic_name="xxxx"`
			`local_ip="xxxx"`

[Doc] Clearer corresponding relationship between configurations for multi-node guides (#3441) Optimize multi-node guide: more clearer corresponding relationship between configuration items and nodes ### What this PR does / why we need it? Some issues caused by misunderstandings due to unclear guidance content, for example: #3367 ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: leo-pony <nengjunma@outlook.com> 2025-10-16 08:54:03 +08:00			`# The value of node0_ip must be consistent with the value of local_ip set in node0 (master node)`
			`node0_ip="xxxx"`

[Doc] Support kimi-k2-w8a8 (#2162) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9edd1db02bc6dce6da503503a373657f3466a78b --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-08-06 19:28:47 +08:00			`export HCCL_IF_IP=$local_ip`
			`export GLOO_SOCKET_IFNAME=$nic_name`
			`export TP_SOCKET_IFNAME=$nic_name`
			`export HCCL_SOCKET_IFNAME=$nic_name`
			`export OMP_PROC_BIND=false`
[0.11.0]fix the configuration conflicts in documentation (#4824) ### What this PR does / why we need it? Fix configuration errors in our documentation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? NA. Signed-off-by: linfeng-yuan <1102311262@qq.com> 2025-12-09 15:37:06 +08:00			`export OMP_NUM_THREADS=10`
[Doc] Support kimi-k2-w8a8 (#2162) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9edd1db02bc6dce6da503503a373657f3466a78b --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-08-06 19:28:47 +08:00			`export VLLM_USE_V1=1`
			`export HCCL_BUFFSIZE=1024`

			`vllm serve /home/cache/weights/Kimi-K2-Instruct-W8A8 \`
			`--host 0.0.0.0 \`
			`--port 8004 \`
			`--headless \`
			`--data-parallel-size 4 \`
			`--data-parallel-size-local 2 \`
			`--data-parallel-start-rank 2 \`
			`--data-parallel-address $node0_ip \`
			`--data-parallel-rpc-port 13389 \`
			`--seed 1024 \`
			`--tensor-parallel-size 8 \`
			`--served-model-name kimi \`
			`--max-num-seqs 16 \`
			`--max-model-len 32768 \`
			`--quantization ascend \`
			`--max-num-batched-tokens 4096 \`
			`--enable-expert-parallel \`
			`--trust-remote-code \`
			`--no-enable-prefix-caching \`
			`--gpu-memory-utilization 0.92 \`
			`--additional-config '{"ascend_scheduler_config":{"enabled":true},"torchair_graph_config":{"enabled":true}}'`
			```

[v0.11.0][Doc] Update doc (#3852) ### What this PR does / why we need it? Update doc Signed-off-by: hfadzxy <starmoon_zhang@163.com> 2025-10-29 11:32:12 +08:00			`The deployment view looks like:`
[Doc] Support kimi-k2-w8a8 (#2162) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9edd1db02bc6dce6da503503a373657f3466a78b --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-08-06 19:28:47 +08:00			`![alt text](../assets/multi_node_dp_kimi.png)`

			`Once your server is started, you can query the model with input prompts:`

			```shell
			`curl http://{ node0 ip:8004 }/v1/completions \`
			`-H "Content-Type: application/json" \`
			`-d '{`
			`"model": "kimi",`
			`"prompt": "The future of AI is",`
			`"max_tokens": 50,`
			`"temperature": 0`
			`}'`
			```