xc-llm-ascend/ops at 75de3fa172b600ca6aed7aa870bccb526c956f7b - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

ZYang6263 6188450269 [v0.11.0][Bugfix]Avoid using the fusion operator in the MOE model (#3837 )

### What this PR does / why we need it?
The current MatmulReduceScatter operator experiences performance
degradation in small-shape scenarios, so it determines whether to use
this operator by judging the size of the shape.


---------

Signed-off-by: ZYang6263 <zy626375@gmail.com>

2025-10-28 23:31:19 +08:00

..

[BugFix]Support redundant experts in EPLB (#3473 )

2025-10-18 00:09:16 +08:00

__init__.py

[MoE] [Refactor] Combine common_fused_moe and fused_moe (#3176 )

2025-10-09 14:12:46 +08:00

activation.py

[main] mlp weight prefetch in Qwen Dense Models (#2816 )

2025-09-11 21:20:09 +08:00

attention.py

Disaggregate prefill for kv cache register style (#950 )

2025-07-26 17:15:47 +08:00

casual_conv1d.py

[2/N][Refactor][Qwen3-Next] remove redundant methods and patch methods in Qwen3NextGatedDeltaNet (#3082 )

2025-09-24 11:25:42 +08:00

common_fused_moe.py

[Cherry-pick] Port MoE multi-stream fix to v0.11.0-dev (#3753 )

2025-10-25 15:51:43 +08:00

expert_load_balancer.py

[BugFix] Check all expert maps when using muilty instance. (#3662 )

2025-10-24 17:10:31 +08:00

fla.py

[BugFix][mian] Fixed a triton kernel bug of layer_norm_fwd_kernel for Qwen3-next (#3549 )

2025-10-21 20:20:57 +08:00

layernorm.py

[v0.11.0][Feat] Prefetching Attention QKV Linear Weight With AddRmsNormQuant Custom Op (#3649 )

2025-10-27 09:42:09 +08:00

linear_op.py

[v0.11.0][Bugfix]Avoid using the fusion operator in the MOE model (#3837 )

2025-10-28 23:31:19 +08:00

linear.py

[v0.11.0][refactor] refactor SequenceRowParallelOp forward (#3654 )

2025-10-23 14:45:49 +08:00

register_custom_ops.py

[v0.11.0][refactor] refactor SequenceRowParallelOp forward (#3654 )

2025-10-23 14:45:49 +08:00

rotary_embedding.py

[cherry-pick][Feat] Add mrope fusion op#3708 (#3735 )

2025-10-25 11:41:23 +08:00

sigmoid_gating.py

[2/N][Refactor][Qwen3-Next] remove redundant methods and patch methods in Qwen3NextGatedDeltaNet (#3082 )

2025-09-24 11:25:42 +08:00

vocab_parallel_embedding.py

[Feature] optimize sp & qwen3 next support sp. (#3225 )

2025-10-13 23:02:12 +08:00

weight_prefetch.py

[Feat]Qwen3 Moe supports npu_add_rms_norm_quant op by default, update op with bias, resolve conflict with weight prefetch (#3465 )

2025-10-17 09:30:51 +08:00