[CI] Accuracy issue of qwen3-next-w8a8 nightly test fix. (#5746)
### What this PR does / why we need it?
Close the **Full Graph** mode to temporarily avoid accuracy issue for
**Qwen3-Next-80B-A3B-Instruct-W8A8**.
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef
---------
Signed-off-by: InSec <1790766300@qq.com>
This commit is contained in:
@@ -78,7 +78,7 @@ async def test_models(model: str) -> None:
|
|||||||
"--gpu-memory-utilization",
|
"--gpu-memory-utilization",
|
||||||
"0.65",
|
"0.65",
|
||||||
"--compilation-config",
|
"--compilation-config",
|
||||||
'{"cudagraph_capture_sizes": [32], "cudagraph_mode":"FULL_DECODE_ONLY"}',
|
'{"cudagraph_capture_sizes": [32]}',
|
||||||
]
|
]
|
||||||
request_keyword_args: dict[str, Any] = {
|
request_keyword_args: dict[str, Any] = {
|
||||||
**api_keyword_args,
|
**api_keyword_args,
|
||||||
|
|||||||
Reference in New Issue
Block a user