[k8s] Clarified the usage of shared memory. (#4341)
This commit is contained in:
@@ -39,6 +39,8 @@ spec:
|
|||||||
limits:
|
limits:
|
||||||
nvidia.com/gpu: 1
|
nvidia.com/gpu: 1
|
||||||
volumeMounts:
|
volumeMounts:
|
||||||
|
- name: shm
|
||||||
|
mountPath: /dev/shm
|
||||||
- name: hf-cache
|
- name: hf-cache
|
||||||
mountPath: /root/.cache/huggingface
|
mountPath: /root/.cache/huggingface
|
||||||
readOnly: true
|
readOnly: true
|
||||||
@@ -52,6 +54,10 @@ spec:
|
|||||||
initialDelaySeconds: 30
|
initialDelaySeconds: 30
|
||||||
periodSeconds: 10
|
periodSeconds: 10
|
||||||
volumes:
|
volumes:
|
||||||
|
- name: shm
|
||||||
|
emptyDir:
|
||||||
|
medium: Memory
|
||||||
|
sizeLimit: 10Gi
|
||||||
- name: hf-cache
|
- name: hf-cache
|
||||||
hostPath:
|
hostPath:
|
||||||
path: /root/.cache/huggingface
|
path: /root/.cache/huggingface
|
||||||
|
|||||||
@@ -21,6 +21,7 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
|
|||||||
```
|
```
|
||||||
|
|
||||||
- See [hyperparameter tuning](hyperparameter_tuning.md) on tuning hyperparameters for better performance.
|
- See [hyperparameter tuning](hyperparameter_tuning.md) on tuning hyperparameters for better performance.
|
||||||
|
- For docker and Kubernetes runs, you need to set up shared memory which is used for communication between processes. See `--shm-size` for docker and `/dev/shm` size update for Kubernetes manifests.
|
||||||
- If you see out-of-memory errors during prefill for long prompts, try to set a smaller chunked prefill size.
|
- If you see out-of-memory errors during prefill for long prompts, try to set a smaller chunked prefill size.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|||||||
Reference in New Issue
Block a user