[feature] kv transfer support of ascend npu (#7795)

Co-authored-by: liupeng <liupeng374@huawei.com>
2025-07-11 10:07:51 +03:00
parent 615553079d
commit 86044712c6
10 changed files with 267 additions and 53 deletions
--- a/docs/backend/pd_disaggregation.md
+++ b/docs/backend/pd_disaggregation.md
@@ -111,3 +111,36 @@ $ python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3-0324 ---di
 # decode 1
 $ python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3-0324 ---disaggregation-transfer-backend nixl --disaggregation-mode decode --host ${local_ip} --port 30001 --trust-remote-code --dist-init-addr ${decode_master_ip}:5000 --nnodes 2 --node-rank 1 --tp-size 16 --dp-size 8 --enable-dp-attention --enable-deepep-moe --deepep-mode low_latency --mem-fraction-static 0.8 --max-running-requests 128
 ```
+
+## ASCEND
+
+### Usage
+
+Use ascend backend with [mf_adapter(download link)](https://sglang-ascend.obs.cn-east-3.myhuaweicloud.com:443/sglang/mf_adapter-1.0.0-cp311-cp311-linux_aarch64.whl?AccessKeyId=HPUAXT4YM0U8JNTERLST&Expires=1783151861&Signature=3j10QDUjqk70enaq8lostYV2bEA%3D) and ASCEND_MF_STORE_URL being set
+
+```bash
+pip install mf_adapter-1.0.0-cp311-cp311-linux_aarch64.whl --force-reinstall
+export ASCEND_MF_STORE_URL="tcp://xxx.xx.xxx.xxx:xxxx"
+```
+Use mooncake backend, more details can be found in mooncake section.
+```bash
+export ENABLE_ASCEND_TRANSFER_WITH_MOONCAKE=true
+```
+
+
+### Llama Single Node
+
+```bash
+$ python -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --disaggregation-mode prefill --disaggregation-transfer-backend ascend
+$ python -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --disaggregation-mode decode --port 30001 --base-gpu-id 1 --disaggregation-transfer-backend ascend
+$ python -m sglang.srt.disaggregation.mini_lb --prefill http://127.0.0.1:30000 --decode http://127.0.0.1:30001 --host 0.0.0.0 --port 8000
+```
+
+### DeepSeek Multi-Node
+
+```bash
+# prefill 0
+$ python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3-0324 ---disaggregation-transfer-backend ascend --disaggregation-mode prefill --host ${local_ip} --port 30000 --trust-remote-code --dist-init-addr ${prefill_master_ip}:5000 --nnodes 1 --node-rank 0 --tp-size 16
+# decode 0
+$ python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3-0324 ---disaggregation-transfer-backend ascend --disaggregation-mode decode --host ${local_ip} --port 30001 --trust-remote-code --dist-init-addr ${decode_master_ip}:5000 --nnodes 1 --node-rank 0 --tp-size 16
+```