xc-llm-ascend

Files

xleoken bea3d5bbb4 [Bug] Fix run bug in run_dp_server.sh (#2139 )

### What this PR does / why we need it?

For `Qwen2.5-0.5B-Instruct` model
- the model's total number of attention heads (14) must be divisible by
tensor parallel size. (4 -> 2)
- the model does not support enable-expert-parallel

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Local Test.

- vLLM version: v0.10.0
- vLLM main:
ad57f23f6a

Signed-off-by: xleoken <xleoken@163.com>

2025-08-02 16:52:12 +08:00

disaggregated_prefill_v1

[CI] Enable linux-aarch64-a2 (64GB) and tp2 * 2 max-parallel to speed up CI (#2065 )

2025-07-29 18:59:05 +08:00

eplb

[Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837 )

2025-07-17 14:13:30 +08:00

offline_data_parallel.py

[Misc] Add extra checking to torchair_graph_config. (#1939 )

2025-08-01 09:24:11 +08:00

offline_disaggregated_prefill_npu.py

[BugFix] update the kv transfer config (#2121 )