[CI][Doc] Optimize multi-node CI (#3565)
### What this PR does / why we need it?
This pull request mainly do the following things:
1. Add a doc for multi-node CI, The main content is the mechanism
principle and how to contribute
2. Simplify the config yaml for more developer-friendly
3. Optimized the mooncake installation script to prevent accidental
failures during installation
4. Fix the workflow to ensure the kubernetes can be apply correctly
5. Add Qwen3-235B-W8A8 disaggregated_prefill test
6. Add GLM-4.5 multi dp test
7. Add 2p1d 4nodes disaggregated_prefill test
8. Refactor nightly tests
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.11.0rc3
- vLLM main:
17c540a993
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
This commit is contained in:
@@ -17,19 +17,24 @@ spec:
|
||||
- name: vllm-leader
|
||||
image: {{ image | default("m.daocloud.io/quay.io/ascend/cann:8.2.rc1-a3-ubuntu22.04-py3.11") }}
|
||||
env:
|
||||
- name: CONFIG_YAML_PATH
|
||||
value: {{ config_file_path | default("tests/e2e/nightly/multi_node/config/models/DeepSeek-V3.yaml") }}
|
||||
- name: WORKSPACE
|
||||
value: "/root/workspace"
|
||||
# Set vLLM version and vLLM-Ascend version here, once there is a new release, update here.
|
||||
- name: VLLM_VERSION
|
||||
value: "v0.11.0"
|
||||
- name: VLLM_ASCEND_VERSION
|
||||
value: "main"
|
||||
value: {{ vllm_ascend_ref | default("main") }}
|
||||
- name: VLLM_ASCEND_REMOTE_URL
|
||||
value: {{ vllm_ascend_remote_url | default("https://github.com/vllm-project/vllm-ascend.git") }}
|
||||
- name: RESULT_FILE_PATH
|
||||
value: {{ result_file_path | default("/root/.cache/tests/ret/test_result.txt") }}
|
||||
command:
|
||||
- sh
|
||||
- -c
|
||||
- |
|
||||
bash /root/.cache/tests/run.sh
|
||||
tail -f /dev/null
|
||||
resources:
|
||||
limits:
|
||||
huawei.com/ascend-1980: "16"
|
||||
@@ -70,19 +75,24 @@ spec:
|
||||
- name: vllm-worker
|
||||
image: {{ image | default("m.daocloud.io/quay.io/ascend/cann:8.2.rc1-a3-ubuntu22.04-py3.11") }}
|
||||
env:
|
||||
- name: CONFIG_YAML_PATH
|
||||
value: {{ config_file_path | default("tests/e2e/nightly/multi_node/config/models/DeepSeek-V3.yaml") }}
|
||||
- name: WORKSPACE
|
||||
value: "/root/workspace"
|
||||
# Set vLLM version and vLLM-Ascend version here, once there is a new release, update here.
|
||||
- name: VLLM_VERSION
|
||||
value: "v0.11.0"
|
||||
- name: VLLM_ASCEND_VERSION
|
||||
value: "main"
|
||||
value: {{ vllm_ascend_ref | default("main") }}
|
||||
- name: VLLM_ASCEND_REMOTE_URL
|
||||
value: {{ vllm_ascend_remote_url | default("https://github.com/vllm-project/vllm-ascend.git") }}
|
||||
- name: RESULT_FILE_PATH
|
||||
value: {{ result_file_path | default("/root/.cache/tests/ret/test_result.txt") }}
|
||||
command:
|
||||
- sh
|
||||
- -c
|
||||
- |
|
||||
bash /root/.cache/tests/run.sh
|
||||
tail -f /dev/null
|
||||
resources:
|
||||
limits:
|
||||
huawei.com/ascend-1980: "16"
|
||||
|
||||
Reference in New Issue
Block a user