support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277)

2025-06-05 13:11:24 +08:00
parent 4474eaf552
commit 8e3797be1c
20 changed files with 2177 additions and 12 deletions
--- a/docs/backend/server_arguments.md
+++ b/docs/backend/server_arguments.md
@@ -201,6 +201,7 @@ Please consult the documentation below and [server_args.py](https://github.com/s
 | `disable_cuda_graph_padding` | Disable CUDA Graph when padding is needed; otherwise, still use CUDA Graph. | `False` |
 | `disable_outlines_disk_cache` | Disable disk cache for outlines grammar backend. | `False` |
 | `disable_custom_all_reduce` | Disable usage of custom all-reduce kernel. | `False` |
+| `enable_mscclpp` | Enable usage of mscclpp kernel for small message all-reduce. | `False` |
 | `disable_overlap_schedule` | Disable the [Overhead-Scheduler](https://lmsys.org/blog/2024-12-04-sglang-v0-4/#zero-overhead-batch-scheduler). | `False` |
 | `enable_nan_detection` | Enable warning if the logits contain `NaN`. | `False` |
 | `enable_p2p_check` | Turns off the default of always allowing P2P checks when accessing GPU. | `False` |