xc-llm-ascend

Author SHA1 Message Date

Author	SHA1	Message	Date
Rozwel-dx	8d571286dd	[Refactor] Modify the binding logic to allocate CPU cores for each NPU card (#5555 ) [Refactor] Modify the binding logic to allocate CPU cores for each NPU card ### What this PR does / why we need it? Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `c85cc045f8` Signed-off-by: rowzwel_dx <1392851715@qq.com> - vLLM version: v0.13.0 - vLLM main: `7157596103` Signed-off-by: Rozwel-dx <1392851715@qq.com>	2026-01-13 09:21:28 +08:00
Zhu Yi Lin	4a849df6fa	[main] support cpu binding (#3546 ) ### What this PR does / why we need it? Currently, in the piecewise of aclgraph, the model will be in eagle mode in attention, which will cause abnormal allreduce latency of O matrix. The reason is that cpu resources will be preempted in eagle mode. So I hope to temporarily add cpu binding to vllm-ascend. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed with new existing test. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: GDzhu1 <809721801@qq.com>	2025-10-21 09:17:03 +08:00

Rozwel-dx

8d571286dd

[Refactor] Modify the binding logic to allocate CPU cores for each NPU card (#5555 )

[Refactor] Modify the binding logic to allocate CPU cores for each NPU
card

### What this PR does / why we need it?
Modify the binding logic to allocate CPU cores for each NPU card based
on NUMA affinity, while isolating acl_thread/release_thread and other
processes to prevent mutual interference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

c85cc045f8

Signed-off-by: rowzwel_dx <1392851715@qq.com>
- vLLM version: v0.13.0
- vLLM main:
7157596103

Signed-off-by: Rozwel-dx <1392851715@qq.com>

2026-01-13 09:21:28 +08:00

Zhu Yi Lin

4a849df6fa

[main] support cpu binding (#3546 )

### What this PR does / why we need it?

Currently, in the piecewise of aclgraph, the model will be in eagle mode
in attention, which will cause abnormal allreduce latency of O matrix.
The reason is that cpu resources will be preempted in eagle mode. So I
hope to temporarily add cpu binding to vllm-ascend.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

CI passed with new existing test.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: GDzhu1 <809721801@qq.com>

2025-10-21 09:17:03 +08:00

2 Commits