【doc fix】doc fix: deepseekv3.1 (#4645)

### What this PR does / why we need it?
fix deepseekv3.1 doc to recomand developers to use Mooncake instead of LLMDatadist

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: AiChiMomo <1092626063@qq.com>
This commit is contained in:
1092626063
2025-12-02 21:49:13 +08:00
committed by GitHub
parent 1b5513aa91
commit b84c9afbf5

View File

@@ -254,7 +254,7 @@ vllm serve /weights/DeepSeek-V3.1_w8a8mix_mtp \
### Prefill-Decode Disaggregation ### Prefill-Decode Disaggregation
There are two ways to deploy `Prefill-Decode Disaggregation`: [Llmdatadist](./multi_node_pd_disaggregation_llmdatadist.md) and [Mooncake](./multi_node_pd_disaggregation_mooncake.md). We recommend use Mooncake for deploy. We recommend using Mooncake for deployment: [Mooncake](./multi_node_pd_disaggregation_mooncake.md).
Take Atlas 800 A3 (64G × 16) for example, we recommend to deploy 2P1D (4 nodes) rather than 1P1D (2 nodes), because there is no enough NPU memory to serve high concurrency in 1P1D case. Take Atlas 800 A3 (64G × 16) for example, we recommend to deploy 2P1D (4 nodes) rather than 1P1D (2 nodes), because there is no enough NPU memory to serve high concurrency in 1P1D case.
- `DeepSeek-V3.1_w8a8mix_mtp 2P1D Layerwise` require 4 Atlas 800 A3 (64G × 16). - `DeepSeek-V3.1_w8a8mix_mtp 2P1D Layerwise` require 4 Atlas 800 A3 (64G × 16).