support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277)
This commit is contained in:
@@ -201,6 +201,7 @@ Please consult the documentation below and [server_args.py](https://github.com/s
|
||||
| `disable_cuda_graph_padding` | Disable CUDA Graph when padding is needed; otherwise, still use CUDA Graph. | `False` |
|
||||
| `disable_outlines_disk_cache` | Disable disk cache for outlines grammar backend. | `False` |
|
||||
| `disable_custom_all_reduce` | Disable usage of custom all-reduce kernel. | `False` |
|
||||
| `enable_mscclpp` | Enable usage of mscclpp kernel for small message all-reduce. | `False` |
|
||||
| `disable_overlap_schedule` | Disable the [Overhead-Scheduler](https://lmsys.org/blog/2024-12-04-sglang-v0-4/#zero-overhead-batch-scheduler). | `False` |
|
||||
| `enable_nan_detection` | Enable warning if the logits contain `NaN`. | `False` |
|
||||
| `enable_p2p_check` | Turns off the default of always allowing P2P checks when accessing GPU. | `False` |
|
||||
|
||||
Reference in New Issue
Block a user