support multi npu partially
This commit is contained in:
@@ -30,6 +30,5 @@ docker build -t vllm-ascend-multi-llm:latest -f ./Dockerfile .
|
||||
|
||||
## Limitations
|
||||
|
||||
- This project only support share a single NPU currently. This is also limited by the fact that HCCL cannot be shared.
|
||||
- Restricted by the fact that HCCL cannot be shared, deploying more than one model with multi-GPU (e.g., TP) is not feasible currently.
|
||||
- The prefix cache will be reset when the LLM is restored, since we just simply discard the KV cache when the LLM is offloaded.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user