[Nightly] Initial logging for nightly multi-node testing (#5362)

### What this PR does / why we need it?
Currently, our multi-node logs only show the master node's logs (via the
Kubernetes API), which is insufficient for effective problem
localization if other nodes experience issues. Therefore, this pull
request adds the ability to upload logs for other nodes.

Next plan: Output structured directory logs, including logs from each
node and the polog.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
This commit is contained in:
Li Wang
2025-12-26 11:39:07 +08:00
committed by GitHub
parent 320877d488
commit c2f776b846
4 changed files with 39 additions and 23 deletions

View File

@@ -257,7 +257,7 @@ class RemoteOpenAIServer:
except RequestException:
all_ready = False
if should_log:
logger.info(f"[WAIT] {url}: connection failed")
logger.debug(f"[WAIT] {url}: connection failed")
# check unexpected exit
result = self._poll()