xc-llm-ascend

Files

liuchen2026fly 542258ac9d [feat] parameterize hardcoded MLA dimensions to support GLM5-W8A8 (#6902 )

Derive MLA dimension constants (q_lora_rank, qk_nope_head_dim, etc.)
from tensor shapes at runtime instead of hardcoding DeepSeek V3 values.
This enables the mla_preprocess fused op to work with both DeepSeek V3
and GLM5 models without Python API changes.

- Add 9 dimension fields to MlaTilingData with DeepSeek V3 defaults
- Add OpParam fields and dynamize all host-side tiling functions
- Derive dimensions from wuk, gamma1, kv_cache_rope tensor shapes
- Replace 310+ hardcoded constants across 4 kernel .hpp files
- Remove unused MMSIZE1/MMSIZE2 constants

### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

---------

Signed-off-by: liuchenbing <chenliumail@163.com>
Co-authored-by: liuchenbing <chenliumail@163.com>

2026-03-09 20:17:21 +08:00

kernel

add mla_preprocess kernel (#3226 )

2025-10-12 07:39:45 +08:00

mla_preprocess_kernel.cpp

[feat] parameterize hardcoded MLA dimensions to support GLM5-W8A8 (#6902 )

2026-03-09 20:17:21 +08:00

mla_preprocess_mix_bf16_nq.hpp

[feat] parameterize hardcoded MLA dimensions to support GLM5-W8A8 (#6902 )

2026-03-09 20:17:21 +08:00

mla_preprocess_mix_bf16_qdown.hpp

[feat] parameterize hardcoded MLA dimensions to support GLM5-W8A8 (#6902 )