xc-llm-ascend/ops at 1e31b07fa7371ee8a05f58a546882733afddb2a7 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

ZYang6263 d08401d1e7 [Main][Bugfix]Avoid using the fusion operator in the MOE model (#3834 )

### What this PR does / why we need it?
The current MatmulReduceScatter operator experiences performance
degradation in small-shape scenarios, so it determines whether to use
this operator by judging the size of the shape.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

---------

Signed-off-by: ZYang6263 <zy626375@gmail.com>

2025-10-28 23:30:27 +08:00

..

Upgrade to new vllm commit (#3719 )

2025-10-25 15:36:32 +08:00

__init__.py

[Refactor] [MoE] Rename moe-related classes & files (#3646 )

2025-10-25 11:22:03 +08:00

activation.py

[main] mlp weight prefetch in Qwen Dense Models (#2816 )

2025-09-11 21:20:09 +08:00

attention.py

remove useless code (#3685 )

2025-10-24 16:29:08 +08:00

casual_conv1d.py

[2/N][Refactor][Qwen3-Next] remove redundant methods and patch methods in Qwen3NextGatedDeltaNet (#3082 )

2025-09-24 11:25:42 +08:00

expert_load_balancer.py

[BugFix] Check all expert maps when using muilty instance. (#3576 )

2025-10-24 17:10:14 +08:00

fla.py

[BugFix][mian] Fixed a triton kernel bug of layer_norm_fwd_kernel for Qwen3-next (#3549 )

2025-10-21 20:20:57 +08:00

layernorm.py

[Feat] Prefetching Attention QKV Linear Weight With AddRmsNormQuant Custom Op (#3517 )

2025-10-23 10:07:37 +08:00

linear_op.py

[Main][Bugfix]Avoid using the fusion operator in the MOE model (#3834 )

2025-10-28 23:30:27 +08:00

linear.py

[main][refactor] refactor SequenceRowParallelOp forward (#3616 )

2025-10-23 14:41:15 +08:00

register_custom_ops.py

Upgrade to new vllm commit (#3719 )

2025-10-25 15:36:32 +08:00

rotary_embedding.py

[Feat] Add mrope fusion op (#3708 )

2025-10-25 09:12:18 +08:00

sigmoid_gating.py

[2/N][Refactor][Qwen3-Next] remove redundant methods and patch methods in Qwen3NextGatedDeltaNet (#3082 )

2025-09-24 11:25:42 +08:00

vocab_parallel_embedding.py

[Feature] optimize sp & qwen3 next support sp. (#3225 )

2025-10-13 23:02:12 +08:00

weight_prefetch.py

[Feat]Qwen3 Moe supports npu_add_rms_norm_quant op by default, update op with bias, resolve conflict with weight prefetch (#3465 )

2025-10-17 09:30:51 +08:00