xc-llm-ascend

Files

Zhijun Chen 0ead5e8681 perf: adaptive block size selection in linear_persistent kernel (#6537 )

### What this PR does / why we need it?

**Optimization:** Replaces fixed block sizes (128x128x128) in
`linear_persistent_kernel` with adaptive selection logic that considers:
- Matrix dimensions (M, N, K) 
- Device NPU vector core count
- Data type (float32 vs others)

**Why:** Fixed block sizes lead to suboptimal hardware utilization
across different matrix shapes. Adaptive sizing maximizes occupancy and
memory efficiency for varied workload patterns, improving throughput for
batch-invariant linear operations in LLM inference.

**Details:**
- Small matrices (M < 256): Size-proportional allocation
- Medium matrices (256 ≤ M < 1024): Balanced distribution based on grid
capacity
- Large matrices (M ≥ 1024): Optimized for dominant dimension

### Does this PR introduce _any_ user-facing change?

No. This is a performance optimization. The API and numerical results
remain unchanged; only kernel execution efficiency improves.

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

Signed-off-by: DDCHY <843049740@qq.com>
Signed-off-by: zjchenn <zjchenn@gmail.com>
Co-authored-by: DDCHY <843049740@qq.com>

2026-02-04 21:36:26 +08:00

__init__.py

[Feature] implement basic framework for batch invariant (#5517 )

2026-01-07 09:11:26 +08:00

matmul.py

perf: adaptive block size selection in linear_persistent kernel (#6537 )

2026-02-04 21:36:26 +08:00

mean.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #12 ) (#6177 )

2026-01-23 14:59:19 +08:00

rmsnorm.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #12 ) (#6177 )

2026-01-23 14:59:19 +08:00

softmax.py

[Feature] implement basic framework for batch invariant (#5517 )

2026-01-07 09:11:26 +08:00