xc-llm-ascend

Files

hwhaokun a47aa4da2f [feat] apply flashcomm1 on bailing (#4868 )

### What this PR does / why we need it?
This PR adjusts the layer prefix matching rules for tensor parallelism
(column/row parallel ops) to fit Bailing model's naming conventions
(adding "query_key_value" for column parallel and "attention.dense" for
row parallel), enabling flashcomm1 to work properly on the Bailing
model.

### Does this PR introduce _any_ user-facing change?
No

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: hwhaokun <haokun0405@163.com>

2025-12-11 17:02:21 +08:00

fused_moe

[Feat] Support native Kimi-K2-Thinking native W4A16 quantized experts weights (#4516 )

2025-12-10 15:58:52 +08:00

triton

[OPS] support triton causal_conv1d_fn ops (#4119 )

2025-12-11 15:52:39 +08:00

__init__.py

[Refactor] [MoE] Rename moe-related classes & files (#3646 )

2025-10-25 11:22:03 +08:00

activation.py

[refact] unified soc_version code (#4359 )