support multi npu partially

2026-01-08 06:54:33 +00:00
parent fa0fb46853
commit 1976644dbb
12 changed files with 283 additions and 158 deletions
--- a/README.md
+++ b/README.md
@@ -30,6 +30,5 @@ docker build -t vllm-ascend-multi-llm:latest -f ./Dockerfile .

 ## Limitations

- This project only support share a single NPU currently. This is also limited by the fact that HCCL cannot be shared. 
+- Restricted by the fact that HCCL cannot be shared, deploying more than one model with multi-GPU (e.g., TP) is not feasible currently.
 - The prefix cache will be reset when the LLM is restored, since we just simply discard the KV cache when the LLM is offloaded.
-