[v0.18.0][CI]Add rank0 process count check for DeepSeek-R1-W8A8-HBM test (#8072)
### What this PR does / why we need it? Adds a `check_rank0_process_count` validation step to the DeepSeek-R1-W8A8-HBM nightly single-node test. The check verifies that after the server starts, there is **exactly 1** `vllm serve` process running on rank0. This guards against the regression fixed in #8041 (extra NPU context leaking on device 0), ensuring it does not silently reappear in future releases. #### Changes - **`tests/e2e/nightly/single_node/models/scripts/test_single_node.py`**: Add `run_check_rank0_process_count` async handler. It calls `npu-smi info` for diagnostics, then uses `psutil` to assert exactly one `vllm serve` process exists on rank0. - **`tests/e2e/nightly/single_node/models/configs/DeepSeek-R1-W8A8-HBM.yaml`**: Register `check_rank0_process_count` in the `test_content` list for the HBM test case. Signed-off-by: hfadzxy <starmoon_zhang@163.com>
This commit is contained in:
@@ -39,4 +39,7 @@ test_cases:
|
||||
- "--enforce-eager"
|
||||
- "--additional-config"
|
||||
- '{"ascend_scheduler_config": {"enabled": false}, "torchair_graph_config": {"enabled": false, "enable_multistream_shared_expert": false}}'
|
||||
test_content:
|
||||
- completion
|
||||
- check_rank0_process_count
|
||||
benchmarks:
|
||||
|
||||
Reference in New Issue
Block a user