xc-llm-ascend

Files

Slightwind 12ca99c94e [Bugfix] Remove ModelSlim-"M4 Quantization". (#4589 )

The M4 quantization method in ModelSlim adds bias to model weights that
originally do not have a linear bias. PR #4235 supported PD-MIX
quantization and M4 quantization, adding bias to `w8a8.py` and
`w8a8_dynamic.py`, and implementing adaptations in `ops/linear.py` to
prevent it from being reset to `None` by
`self.register_parameter("bias", None)`. However, this modification
introduced an issue where the bias was still being reset to `None` in
certain scenarios, causing errors during service startup. Therefore,
support for M4 quantization is temporarily being reverted in this PR.
___
- vLLM version: v0.11.2

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>

2025-12-01 23:45:02 +08:00

fused_moe

[EPLB][Ops] Integerate grouped_matmul_swiglu_quant_weight_nz_tensor_list operator into dynamic EPLB (#4216 )

2025-11-30 22:52:05 +08:00

triton

【OPS】qwen3-next support triton chunk_gated_delta_rule ops (#4070 )

2025-11-28 20:55:43 +08:00

__init__.py

[Refactor] [MoE] Rename moe-related classes & files (#3646 )

2025-10-25 11:22:03 +08:00

activation.py

[refact] unified soc_version code (#4359 )