[BugFix] Require kv producer for layer sharding (#8563)

### What this PR does / why we need it?

This PR introduce stricter Ascend `additional_config.layer_sharding`
validation to the 0.18 release branch so it is only accepted on
PD-disaggregated P nodes with `kv_role="kv_producer"`.


### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

E2E test

---------

Signed-off-by: chenchuw886 <chenchuw@huawei.com>
Co-authored-by: chenchuw886 <chenchuw@huawei.com>
This commit is contained in:
Frank Chen
2026-04-23 16:06:53 +08:00
committed by GitHub
parent 4a254ba59a
commit a4ba82e138
4 changed files with 9 additions and 7 deletions

View File

@@ -20,7 +20,7 @@ On multisocket ARM systems, the OS scheduler may place vLLM threads on CPUs f
| Device type | Default mode | Description |
| ----------- | ------------ | ------------ |
| A3 (No Affinity) | `global_slice` | Splits the allowed CPU list evenly based on the **total number of global logical NPUs**, ensuring each NPU is assigned a contiguous segment of CPU cores. This prevents CPU core overlap across multiple process groups. |
| A2 / 310P / Others | `topo_affinity` | Allocates CPUs based on NPU topology affinity (`npusmi info -t topo`). If multiple NPUs are assigned to a single NUMA node (which may cause bandwidth contention), the CPU allocation extends to adjacent NUMA nodes. |
| A2 / Atlas 300 inference products / Others | `topo_affinity` | Allocates CPUs based on NPU topology affinity (`npusmi info -t topo`). If multiple NPUs are assigned to a single NUMA node (which may cause bandwidth contention), the CPU allocation extends to adjacent NUMA nodes. |
- **Default**: enabled (enable_cpu_binding = true).
- **Fallback**: If NPU topo affinity is unavailable, global_slice is used.
@@ -156,7 +156,7 @@ With the current `global_slice` strategy, some CPU/NPU layouts cannot avoid cros
|2|10-14|`IRQ`: 10-11, `Main`: 12, `ACL`: 13, `Release`: 14|
|3|15-19|`IRQ`: 15-16, `Main`: 17, `ACL`: 18, `Release`: 19|
### Example 5: A2/310P topo_affinity with NUMA extension
### Example 5: A2/Atlas 300 inference products topo_affinity with NUMA extension
**Inputs**: