[Doc] Update the Pangu Pro MoE tutorials. (#3651)
### What this PR does / why we need it? Update the Pangu Pro MoE tutorials. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: menogrey <1299267905@qq.com>
This commit is contained in:
@@ -51,6 +51,7 @@ vllm serve /path/to/pangu-pro-moe-model \
|
|||||||
--tensor-parallel-size 4 \
|
--tensor-parallel-size 4 \
|
||||||
--enable-expert-parallel \
|
--enable-expert-parallel \
|
||||||
--trust-remote-code \
|
--trust-remote-code \
|
||||||
|
--max_model_len=1024 \
|
||||||
--enforce-eager
|
--enforce-eager
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -217,6 +218,7 @@ if __name__ == "__main__":
|
|||||||
|
|
||||||
llm = LLM(model="/path/to/pangu-pro-moe-model",
|
llm = LLM(model="/path/to/pangu-pro-moe-model",
|
||||||
tensor_parallel_size=4,
|
tensor_parallel_size=4,
|
||||||
|
enable_expert_parallel=True,
|
||||||
distributed_executor_backend="mp",
|
distributed_executor_backend="mp",
|
||||||
max_model_len=1024,
|
max_model_len=1024,
|
||||||
trust_remote_code=True,
|
trust_remote_code=True,
|
||||||
|
|||||||
Reference in New Issue
Block a user