fix multiproc executor determine kv cache memory & update Dockerfile

Signed-off-by: Jing Wang <jingwang96@qq.com>
2026-04-24 08:31:54 +00:00
parent 6c097beaa5
commit d627a45881
10 changed files with 218 additions and 153 deletions
--- a/README.md
+++ b/README.md
@@ -34,5 +34,4 @@ docker build -t $build_image -f ./Dockerfile .

 ## Limitations

- Restricted by the fact that HCCL cannot be shared, deploying more than one model with multi-GPU (e.g., TP) is not feasible currently.
 - The prefix cache will be reset when the LLM is restored, since we just simply discard the KV cache when the LLM is offloaded.