xc-llm-ascend/ops at c3fee66806f252476796389ea73d13a8aca60146 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

socrahow c3fee66806 [Model] Optimizing gemma3 model's GemmaRMSNorm function (#3151 )

### What this PR does / why we need it?
Before optimizing，the rmsnorm time in one decoding is 531.5us. After
optimizing，the rmsnorm time in one decoding is 105us.
I closed the previous
PR（https://github.com/vllm-project/vllm-ascend/pull/2456） by mistake and
resubmitted it now
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
b1068903fd

---------

Signed-off-by: socrahow <suzihao4@h-partners.com>

2025-09-28 21:19:10 +08:00

..

[CI][Bugfix] Quickfix for DPMetaData (#3234 )

2025-09-28 21:11:22 +08:00

__init__.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

activation.py

[main] mlp weight prefetch in Qwen Dense Models (#2816 )

2025-09-11 21:20:09 +08:00

attention.py

Disaggregate prefill for kv cache register style (#950 )

2025-07-26 17:15:47 +08:00

casual_conv1d.py

[2/N][Refactor][Qwen3-Next] remove redundant methods and patch methods in Qwen3NextGatedDeltaNet (#3082 )

2025-09-24 11:25:42 +08:00

common_fused_moe.py

[Refactor] [SP]The sequence parallelism characteristics in the MoE and Dense models are integrated into a single solution. (#3085 )

2025-09-24 11:29:59 +08:00

expert_load_balancer.py

Add static EPLB (#1116 )

2025-06-09 19:28:11 +08:00

fla.py

[2/N][Refactor][Qwen3-Next] remove redundant methods and patch methods in Qwen3NextGatedDeltaNet (#3082 )

2025-09-24 11:25:42 +08:00

fused_moe.py

[Refactor] [SP]The sequence parallelism characteristics in the MoE and Dense models are integrated into a single solution. (#3085 )

2025-09-24 11:29:59 +08:00

layernorm.py

[Model] Optimizing gemma3 model's GemmaRMSNorm function (#3151 )

2025-09-28 21:19:10 +08:00

linear_op.py

[Bugfix] fix bug when tp=1 (#3193 )

2025-09-26 10:55:32 +08:00

linear.py

refactor linear (#2867 )

2025-09-18 14:09:19 +08:00

register_custom_ops.py

[Refactor] [SP]The sequence parallelism characteristics in the MoE and Dense models are integrated into a single solution. (#3085 )

2025-09-24 11:29:59 +08:00

rotary_embedding.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

sigmoid_gating.py

[2/N][Refactor][Qwen3-Next] remove redundant methods and patch methods in Qwen3NextGatedDeltaNet (#3082 )

2025-09-24 11:25:42 +08:00

vocab_parallel_embedding.py

[Bugfix][LoRA] Fix forward error and shape mismatch when using LoRA (#3153 )

2025-09-28 17:30:50 +08:00