[k8s] Clarified the usage of shared memory. (#4341)

This commit is contained in:
Jiří Suchomel
2025-03-27 16:53:19 +01:00
committed by GitHub
parent 17000d2b3a
commit f60f293195
2 changed files with 7 additions and 0 deletions

View File

@@ -21,6 +21,7 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
```
- See [hyperparameter tuning](hyperparameter_tuning.md) on tuning hyperparameters for better performance.
- For docker and Kubernetes runs, you need to set up shared memory which is used for communication between processes. See `--shm-size` for docker and `/dev/shm` size update for Kubernetes manifests.
- If you see out-of-memory errors during prefill for long prompts, try to set a smaller chunked prefill size.
```bash