xc-llm-ascend

Files

wangqiankun13 904c18f929 [Feature]Use DispatchGmmCombineDecode operator to replace MC2(Optional) (#5040 )

### What this PR does / why we need it?

This PR adds model-side integration for the previously introduced
experimental AscendC fused operator DispatchGmmCombineDecode, used in
MoE decoding.

The operator implementation itself was added in a prior PR[#4139
](https://github.com/vllm-project/vllm-ascend/pull/4139).
This change only adapts the model execution path to optionally use the
fused operator.

When the environment variable VLLM_ASCEND_ENABLE_FUSED_MC2=2 is set, the
original MC2 path composed of multiple operators (A8W8 dispatch → GMM →
SwiGLU → GMM → combine) might be replaced by the single fused operator
DispatchGmmCombineDecode.

By default, the existing multi-operator MC2 implementation is preserved.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: wangqiankun <wangqiankun13@huawei.com>

2025-12-21 15:23:59 +08:00

compressed_tensors

[Feat] Support native Kimi-K2-Thinking native W4A16 quantized experts weights (#4516 )

2025-12-10 15:58:52 +08:00

__init__.py

[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 )

2025-02-21 17:07:37 +08:00

quant_config.py

[2/N][Pangu][MoE] Remove Pangu Related Code (#5130 )

2025-12-19 09:00:07 +08:00

utils.py

[2/N][Pangu][MoE] Remove Pangu Related Code (#5130 )