xc-llm-ascend/ops at 0b1da2474260bd30c5a70f8090c2e6d14ee5d684 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

ZYang6263 0b1da24742 [Main][Perf] Add fused matmul/reduce-scatter kernel for performance optimization. (#3693 )

### What this PR does / why we need it?
This PR boosts performance by introducing a fused kernel for the matrix
matmul and reduce scatter operations. It supports both unquantized
(e.g., BFloat16) and W8A8 quantized models.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: ZYang6263 <zy626375@gmail.com>

2025-10-24 18:19:58 +08:00

..

Reapply "[MoE] [Refactor] Remove manual memory cleanup (#3365 )" (#3483 ) (#3512 )

2025-10-22 11:41:30 +08:00

__init__.py

[MoE] [Refactor] Combine common_fused_moe and fused_moe (#3176 )

2025-10-09 14:12:46 +08:00

activation.py

[main] mlp weight prefetch in Qwen Dense Models (#2816 )

2025-09-11 21:20:09 +08:00

attention.py

remove useless code (#3685 )

2025-10-24 16:29:08 +08:00

casual_conv1d.py

[2/N][Refactor][Qwen3-Next] remove redundant methods and patch methods in Qwen3NextGatedDeltaNet (#3082 )

2025-09-24 11:25:42 +08:00

common_fused_moe.py

[BugFix] Check all expert maps when using muilty instance. (#3576 )

2025-10-24 17:10:14 +08:00

expert_load_balancer.py

[BugFix] Check all expert maps when using muilty instance. (#3576 )

2025-10-24 17:10:14 +08:00

fla.py

[BugFix][mian] Fixed a triton kernel bug of layer_norm_fwd_kernel for Qwen3-next (#3549 )

2025-10-21 20:20:57 +08:00

layernorm.py

[Feat] Prefetching Attention QKV Linear Weight With AddRmsNormQuant Custom Op (#3517 )

2025-10-23 10:07:37 +08:00

linear_op.py

[Main][Perf] Add fused matmul/reduce-scatter kernel for performance optimization. (#3693 )

2025-10-24 18:19:58 +08:00

linear.py

[main][refactor] refactor SequenceRowParallelOp forward (#3616 )

2025-10-23 14:41:15 +08:00

register_custom_ops.py

[main][refactor] refactor SequenceRowParallelOp forward (#3616 )

2025-10-23 14:41:15 +08:00

rotary_embedding.py

Revert "Add mrope op fusion (#3509 )" (#3562 )

2025-10-20 20:19:24 +08:00

sigmoid_gating.py

[2/N][Refactor][Qwen3-Next] remove redundant methods and patch methods in Qwen3NextGatedDeltaNet (#3082 )

2025-09-24 11:25:42 +08:00

vocab_parallel_embedding.py

[Feature] optimize sp & qwen3 next support sp. (#3225 )

2025-10-13 23:02:12 +08:00

weight_prefetch.py

[Feat]Qwen3 Moe supports npu_add_rms_norm_quant op by default, update op with bias, resolve conflict with weight prefetch (#3465 )

2025-10-17 09:30:51 +08:00