Add Ascend Ops recurrent_gated_delta_rule (#6725)

### What this PR does / why we need it?
Change recurrent_gated_delta_rule ops from triton to ascend C version
for better performance.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
9562912cea

---------

Signed-off-by: SunnyLee219 <3294305115@qq.com>
This commit is contained in:
LeeWenquan
2026-03-09 14:14:14 +08:00
committed by GitHub
parent 23bf5d4d48
commit 65eae6de7b
2 changed files with 83 additions and 92 deletions

View File

@@ -24,6 +24,8 @@ _server_cmd: &server_cmd
- "0.8"
- "--max-num-seqs"
- "64"
- "--compilation-config"
- '{"cudagraph_capture_sizes": [64]}'
_benchmarks: &benchmarks
perf:
@@ -42,7 +44,7 @@ _benchmarks: &benchmarks
request_conf: vllm_api_general_chat
dataset_conf: gsm8k/gsm8k_gen_0_shot_cot_chat_prompt
max_out_len: 32768
batch_size: 32
batch_size: 64
top_k: 20
baseline: 95
threshold: 5