xc-llm-ascend

Author SHA1 Message Date

Author	SHA1	Message	Date
liuchen2026fly	542258ac9d	[feat] parameterize hardcoded MLA dimensions to support GLM5-W8A8 (#6902 ) Derive MLA dimension constants (q_lora_rank, qk_nope_head_dim, etc.) from tensor shapes at runtime instead of hardcoding DeepSeek V3 values. This enables the mla_preprocess fused op to work with both DeepSeek V3 and GLM5 models without Python API changes. - Add 9 dimension fields to MlaTilingData with DeepSeek V3 defaults - Add OpParam fields and dynamize all host-side tiling functions - Derive dimensions from wuk, gamma1, kv_cache_rope tensor shapes - Replace 310+ hardcoded constants across 4 kernel .hpp files - Remove unused MMSIZE1/MMSIZE2 constants ### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` --------- Signed-off-by: liuchenbing <chenliumail@163.com> Co-authored-by: liuchenbing <chenliumail@163.com>	2026-03-09 20:17:21 +08:00
luomin2005	f41eeeb11e	Refactor the ops PyTorch adapter，cleanup for csrc/torch_binding.cpp (#6732 ) ### What this PR does / why we need it? Refactor the ops PyTorch adapter，cleanup for csrc/torch_binding.cpp, more details see https://github.com/vllm-project/vllm-ascend/issues/6486 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? install the new package to test the new modification, here is the result: - vLLM version: v0.15.0 - vLLM main: `9562912cea` --------- Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: luomin2005 <luomin2005@huawei.com> Co-authored-by: liziyu <56102866+liziyu179@users.noreply.github.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>	2026-02-24 09:12:43 +08:00

liuchen2026fly

542258ac9d

[feat] parameterize hardcoded MLA dimensions to support GLM5-W8A8 (#6902 )

Derive MLA dimension constants (q_lora_rank, qk_nope_head_dim, etc.)
from tensor shapes at runtime instead of hardcoding DeepSeek V3 values.
This enables the mla_preprocess fused op to work with both DeepSeek V3
and GLM5 models without Python API changes.

- Add 9 dimension fields to MlaTilingData with DeepSeek V3 defaults
- Add OpParam fields and dynamize all host-side tiling functions
- Derive dimensions from wuk, gamma1, kv_cache_rope tensor shapes
- Replace 310+ hardcoded constants across 4 kernel .hpp files
- Remove unused MMSIZE1/MMSIZE2 constants

### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

---------

Signed-off-by: liuchenbing <chenliumail@163.com>
Co-authored-by: liuchenbing <chenliumail@163.com>

2026-03-09 20:17:21 +08:00

luomin2005

f41eeeb11e

Refactor the ops PyTorch adapter，cleanup for csrc/torch_binding.cpp (#6732 )

### What this PR does / why we need it?
Refactor the ops PyTorch adapter，cleanup for csrc/torch_binding.cpp,
more details see
https://github.com/vllm-project/vllm-ascend/issues/6486

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
install the new package to test the new modification, here is the
result:


- vLLM version: v0.15.0
- vLLM main:
9562912cea

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: luomin2005 <luomin2005@huawei.com>
Co-authored-by: liziyu <56102866+liziyu179@users.noreply.github.com>
Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>

2026-02-24 09:12:43 +08:00

2 Commits