xc-llm-ascend/vllm_ascend at b771ca9a479af384a0542f1bd9ac64d0c03dd014 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Frank Chen b771ca9a47 [CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (#6945 )

### What this PR does / why we need it?

This PR introduces global CPU slicing for Ascend NPUs to ensure
non-overlapping CPU partitions, addresses IRQ binding logical errors on
A3, and enhances the logic for determining total NPUs in CPU allocation.
These changes are necessary to optimize CPU resource management and
improve system stability.

- **Global CPU Slicing**: Introduced a global CPU slicing mechanism for
Ascend NPUs to ensure non-overlapping CPU partitions across multiple
processes or data parallel groups, preventing resource contention.
- **Improved IRQ Binding for A3 Devices**: Refined the IRQ binding logic
specifically for Ascend A3 devices, correctly mapping logical NPU IDs to
physical card and chip IDs for accurate npu-smi queries and preventing
multi-process overwrite of IRQ settings.
- **Enhanced NPU Count Determination**: Improved the logic for
determining the total number of logical NPUs, prioritizing NPU mapping
information to ensure more accurate CPU allocation.
- **Minimum CPU Requirement**: Established a minimum requirement of 5
CPUs per NPU for binding, reserving specific cores for IRQ, main, ACL,
and release operations to ensure stable operation.

### Does this PR introduce _any_ user-facing change?

No user-facing changes are introduced.

### How was this patch tested?

CI passed with new added/existing tests.

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

---------

Signed-off-by: c00818886 <chenchuwei@huawei.com>

2026-03-03 17:20:52 +08:00

..

[300I][Bugfix] fix unquant model weight nd2nz error (#6851 )

2026-03-03 15:57:26 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[Feat] support basic pcp&dcp for qwen3next (#6091 )

2026-02-28 21:44:08 +08:00

[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 )

2026-03-02 17:54:25 +08:00

[P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0 (#6898 )

2026-03-02 23:24:03 +08:00

[misc] move mxfp_compat into device to decouple from quantization init chain (#6918 )

2026-03-02 18:17:01 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

[BugFix][PCP] Fix presion bugs for pcp/dcp in PD disaggregate (#6876 )

2026-03-02 16:11:00 +08:00

[EPLB] Reduce the memory used for heat aggregation (#6729 )

2026-02-24 18:02:24 +08:00

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #5 ) (#5996 )

2026-01-24 22:45:38 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #6 ) (#6001 )

2026-01-24 22:08:33 +08:00

[Triton] Centralize Ascend extension op dispatch in triton_utils (#6937 )

2026-03-03 17:10:30 +08:00

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

[Model]Add Qwen3-Omni quantization Ascend NPU adaptation and optimization (#6828 )

2026-03-03 00:07:23 +08:00

clean 0.15.0 support (#6852 )

2026-02-28 09:20:57 +08:00

[Refactor][EAGLE] 7/N Merged PCP and disable_padded interface (#6811 )

2026-02-27 16:06:56 +08:00

[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 )

2026-03-02 17:54:25 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

ascend_config.py

[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 )

2026-03-02 17:54:25 +08:00

ascend_forward_context.py

add mxfp8 moe quantization (#6670 )

2026-03-02 11:04:06 +08:00

batch_invariant.py

implement batch invariant with ascendc (#6590 )

2026-02-10 14:15:26 +08:00

cpu_binding.py

[CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (#6945 )

2026-03-03 17:20:52 +08:00

envs.py

[MISC] Clean up useless env USE_OPTIMIZED_MODEL (#6618 )

2026-02-09 15:38:58 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Ops][Refactor] Remove custom rotary_embedding operator (#6523 )

2026-02-07 09:24:05 +08:00

platform.py

[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 )

2026-03-02 17:54:25 +08:00

profiling_config.py

[Core][Misc] Clean up ProfileExecuteDuration (#6461 )

2026-02-01 20:06:01 +08:00

utils.py

[300I][Bugfix] fix unquant model weight nd2nz error (#6851 )

2026-03-03 15:57:26 +08:00