[2/N][Pangu][MoE] Remove Pangu Related Code (#5130)
### What this PR does / why we need it?
Remove Pangu Related Code
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
e2e & ut
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
This commit is contained in:
@@ -30,8 +30,7 @@ The following table lists additional configuration options available in vLLM Asc
|
||||
| `finegrained_tp_config` | dict | `{}` | Configuration options for module tensor parallelism |
|
||||
| `weight_prefetch_config` | dict | `{}` | Configuration options for weight prefetch |
|
||||
| `refresh` | bool | `false` | Whether to refresh global Ascend configuration content. This is usually used by rlhf or ut/e2e test case. |
|
||||
| `expert_map_path` | str | `None` | When using expert load balancing for an MoE model, an expert map path needs to be passed in. |
|
||||
| `kv_cache_dtype` | str | `None` | When using the KV cache quantization method, KV cache dtype needs to be set, currently only int8 is supported. |
|
||||
| `expert_map_path` | str | `None` | When using expert load balancing for an MoE model, an expert map path needs to be passed in. | |
|
||||
| `enable_shared_expert_dp` | bool | `False` | When the expert is shared in DP, it delivers better performance but consumes more memory. Currently only DeepSeek series models are supported. |
|
||||
| `lmhead_tensor_parallel_size` | int | `None` | The custom tensor parallel size of lmhead. Restriction: Can only be used when tensor_parallel=1 |
|
||||
| `oproj_tensor_parallel_size` | int | `None` | The custom tensor parallel size of oproj. |
|
||||
|
||||
Reference in New Issue
Block a user