xc-llm-ascend/vllm_ascend at a77fe932e401c238197b4e33f4e24a56affe9362 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Frank Chen a77fe932e4 [Platform] Fix CPU binding logic (#6889 )

### What this PR does / why we need it?

- Rework CpuAlloc.handle_no_affinity() to build available NUMA nodes
after allowed_cpus filtering, assign NPUs to NUMA nodes via round‑robin,
and split CPUs per NPU with disjoint slices for better balance.
- Improve bind_memory() robustness by deriving the target NUMA from each
NPU’s CPU pool, validating NUMA existence, and skipping binding when
data is missing.
- bind_memory() now only bind the single NUMA node that corresponds to
NPU id, instead of 2 NUMA nodes.
- Fix the issue that all NPUs bind to 0th NUMA node when DP16 due to
global NPU id is not visible across DP domain.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added/updated unit tests:

test_cpu_binding.py
1.   test_binding_mode_table covers A2 vs A3 binding mode mapping.
2. test_build_cpu_pools_fallback_to_numa_balanced covers fallback when
affinity info is missing.
3. TestBindingSwitch.test_is_arm_cpu covers ARM/x86/unknown arch
detection.
4.   test_bind_cpus_skip_non_arm covers non‑ARM skip path in bind_cpus.

test_worker_v1.py
1. Updated mocks for enable_cpu_binding default True to align with new
config default.

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: chenchuw886 <chenchuw@huawei.com>
Co-authored-by: chenchuw886 <chenchuw@huawei.com>

2026-03-01 20:30:43 +08:00

..

clean 0.15.0 support (#6852 )

2026-02-28 09:20:57 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[Feat] support basic pcp&dcp for qwen3next (#6091 )

2026-02-28 21:44:08 +08:00

clean 0.15.0 support (#6852 )

2026-02-28 09:20:57 +08:00

[BugFix] Support ALL D-Nodes in fullgraph when running MTP in PD (#5472 )

2026-02-26 19:09:05 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

[EPLB] Reduce the memory used for heat aggregation (#6729 )

2026-02-24 18:02:24 +08:00

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #5 ) (#5996 )

2026-01-24 22:45:38 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #6 ) (#6001 )

2026-01-24 22:08:33 +08:00

[Bugfix] rename enable_flash_comm_v1 back to enable_sp (#6883 )

2026-03-01 20:22:50 +08:00

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

clean 0.15.0 support (#6852 )

2026-02-28 09:20:57 +08:00

[Refactor][EAGLE] 7/N Merged PCP and disable_padded interface (#6811 )

2026-02-27 16:06:56 +08:00

[Bugfix] rename enable_flash_comm_v1 back to enable_sp (#6883 )

2026-03-01 20:22:50 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

ascend_config.py

[Feat]support sequence parallelism by pass for VL models (#5632 )

2026-02-27 08:27:41 +08:00

ascend_forward_context.py

[Bugfix] rename enable_flash_comm_v1 back to enable_sp (#6883 )

2026-03-01 20:22:50 +08:00

batch_invariant.py

implement batch invariant with ascendc (#6590 )

2026-02-10 14:15:26 +08:00

cpu_binding.py

[Platform] Fix CPU binding logic (#6889 )

2026-03-01 20:30:43 +08:00

envs.py

[MISC] Clean up useless env USE_OPTIMIZED_MODEL (#6618 )

2026-02-09 15:38:58 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Ops][Refactor] Remove custom rotary_embedding operator (#6523 )

2026-02-07 09:24:05 +08:00

platform.py

[Bugfix] rename enable_flash_comm_v1 back to enable_sp (#6883 )

2026-03-01 20:22:50 +08:00

profiling_config.py

[Core][Misc] Clean up ProfileExecuteDuration (#6461 )

2026-02-01 20:06:01 +08:00

utils.py

[Bugfix] rename enable_flash_comm_v1 back to enable_sp (#6883 )

2026-03-01 20:22:50 +08:00