[Feat]support sequence parallelism by pass for VL models (#5632)

2026-02-27 08:27:41 +08:00
parent ed175d6d92
commit 5def28dcd3
22 changed files with 460 additions and 101 deletions
--- a/docs/source/user_guide/configuration/additional_config.md
+++ b/docs/source/user_guide/configuration/additional_config.md
@@ -45,6 +45,7 @@ The following table lists additional configuration options available in vLLM Asc
 | `pa_shape_list`                     | list | `[]`    | The custom shape list of page attention ops.                                                              |
 | `enable_kv_nz`                      | bool | `False` | Whether to enable KV cache NZ layout. This option only takes effects on models using MLA (e.g., DeepSeek).                                      |
 | `layer_sharding` | dict | `{}` | Configuration options for Layer Sharding Linear |
+| `sp_threshold` | int | `1000` | For dense models, only num_tokens > threshold will enable sequence parallelism. |

 The details of each configuration option are as follows: