[Doc] Add Qwen3.5 fused MC2 known issue for 0.18.0 release (#8378)
<!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # -->show known issues for Qwen3.5-397B ### Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. -->NO ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. -->NA --------- Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
This commit is contained in:
@@ -599,3 +599,8 @@ vllm bench serve --model Eco-Tech/Qwen3.5-397B-A17B-w8a8-mtp --dataset-name rand
|
||||
```
|
||||
|
||||
After about several minutes, you can get the performance evaluation result.
|
||||
|
||||
## Qwen3.5-397B-A17B Known issues
|
||||
|
||||
- Issue1: For single-node deployment scenario, when fused_mc2 is enabled, using multi-DP model deployment may cause garbled or empty outputs after the model triggers recomputation.When tuning performance by adjusting model parallelism, ensure that this fused operator is disabled when DP > 1. For PD deployment scenario,D nodes can avoid this problem by enabling the recompute scheduler.
|
||||
|
||||
Reference in New Issue
Block a user