Files
xc-llm-ascend/examples
xleoken bea3d5bbb4 [Bug] Fix run bug in run_dp_server.sh (#2139)
### What this PR does / why we need it?

For `Qwen2.5-0.5B-Instruct` model
- the model's total number of attention heads (14) must be divisible by
tensor parallel size. (4 -> 2)
- the model does not support enable-expert-parallel

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Local Test.

- vLLM version: v0.10.0
- vLLM main:
ad57f23f6a

Signed-off-by: xleoken <xleoken@163.com>
2025-08-02 16:52:12 +08:00
..