From 7b5a374114c7c6095fe2dd7898ed73da534e05eb Mon Sep 17 00:00:00 2001 From: simveit <69345428+simveit@users.noreply.github.com> Date: Tue, 4 Feb 2025 00:39:41 +0100 Subject: [PATCH] Update server args doc (#3273) Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com> --- docs/backend/server_arguments.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/backend/server_arguments.md b/docs/backend/server_arguments.md index 7e8f4ca0a..35a2a8c4e 100644 --- a/docs/backend/server_arguments.md +++ b/docs/backend/server_arguments.md @@ -159,7 +159,7 @@ Please consult the documentation below to learn more about the parameters you ma * `disable_radix_cache`: Disable [Radix](https://lmsys.org/blog/2024-01-17-sglang/) backend for prefix caching. * `disable_jump_forward`: Disable [jump-forward](https://lmsys.org/blog/2024-02-05-compressed-fsm/#our-method-jump-forward-decoding-with-a-compressed-finite-state-machine) for outlines grammar backend. -* `disable_cuda_graph`: Disable [cuda graph](https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/) for model forward. +* `disable_cuda_graph`: Disable [cuda graph](https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/) for model forward. Use if encountering uncorrectable CUDA ECC errors. * `disable_cuda_graph_padding`: Disable cuda graph when padding is needed. In other case still use cuda graph. * `disable_outlines_disk_cache`: Disable disk cache for outlines grammar backend. * `disable_custom_all_reduce`: Disable usage of custom all reduce kernel.