Add static EPLB (#1116)

### What this PR does / why we need it? Add EPLB expert map import capabilities ### Does this PR introduce _any_ user-facing change? When importing the EPLB expert map you need import expert map file by vllm args additional_config ### How was this patch tested? 1.You need to collect expert hotness and generate an expert placement file based on the hotness and the EPLB algorithm, or you can directly use an existing expert placement table. 2.When launching vLLM, enable EC2 and pass the configuration via the command-line argument: --additional-config '{"expert_map_path": "/xxx/xxx/xx.json"} Co-authored-by: songshanhu07 <1763685535@qq.com> --------- Signed-off-by: songshanhu07 <1763685535@qq.com> Signed-off-by: Yuxiao-Xu <664988918@qq.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: songshanhu07 <1763685535@qq.com> Co-authored-by: Xu Yuxiao <xuyuxiao2@huawei.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-06-09 19:28:11 +08:00
parent cb341c7bcd
commit 6b853f15fe
6 changed files with 179 additions and 31 deletions
--- a/docs/source/user_guide/additional_config.md
+++ b/docs/source/user_guide/additional_config.md
@@ -24,12 +24,13 @@ LLM(model="Qwen/Qwen3-8B", additional_config={"config_key":"config_value"})

 The following table lists the additional configuration options available in vLLM Ascend:

-| Name | Type | Default | Description |
-| ---- | ---- | ------- | ----------- |
-| `torchair_graph_config` | dict | `{}` | The config options for torchair graph mode |
-| `ascend_scheduler_config` | dict | `{}` | The config options for ascend scheduler  |
-| `expert_tensor_parallel_size` | str | `0` | Expert tensor parallel size the model to use. |
-| `refresh` | bool | `false` | Whether to refresh global ascend config content. This value is usually used by rlhf case. |
+| Name                          | Type | Default | Description                                                                                   |
+|-------------------------------| ---- |------|-----------------------------------------------------------------------------------------------|
+| `torchair_graph_config`       | dict | `{}` | The config options for torchair graph mode                                                    |
+| `ascend_scheduler_config`     | dict | `{}` | The config options for ascend scheduler                                                       |
+| `expert_tensor_parallel_size` | str | `0`  | Expert tensor parallel size the model to use.                                                 |
+| `refresh`                     | bool | `false` | Whether to refresh global ascend config content. This value is usually used by rlhf case.     |
+| `expert_map_path`             | str | None | When using expert load balancing for the MOE model, an expert map path needs to be passed in. |

 The details of each config option are as follows: