add env vars & misc
This commit is contained in:
@@ -27,6 +27,10 @@ docker build -t vllm-ascend-multi-llm:latest -f ./Dockerfile .
|
||||
2. Start LLM services with this image, following the official usage instructions.
|
||||
3. Due to the limited stream resource of Ascend NPU, you may need to restrict graph capture sizes or disable ACLgraph by setting `--enforce-eager`, especially when launching multiple LLMs. Refer to the [link](https://docs.vllm.ai/projects/ascend/en/latest/faqs.html#how-to-troubleshoot-and-resolve-size-capture-failures-resulting-from-stream-resource-exhaustion-and-what-are-the-underlying-causes).
|
||||
|
||||
### Environment Variables
|
||||
- `VNPU_RESERVED_VRAM_SIZE_GB`: The amonut of reserved GPU memory for other miscellaneous memory. Only needs to be set for `vllm_vnpu_daemon`. Try increasing the variable if you launch multiple LLM services and encounter OOM. Default: `8`.
|
||||
- `VLLM_VNPU_SHM_NAME`: The name of the shm file. Needs to be set for all containers of the shared vNPU group. Default: `/vllm_acl_vnpu_offload_shm`.
|
||||
|
||||
|
||||
## Limitations
|
||||
|
||||
|
||||
Reference in New Issue
Block a user