Add Custom Kernels For LoRA Performance (#2325)

### What this PR does / why we need it? Add two custom operators (sgmv_shrink and sgmv_expand) to address the performance issues of LoRA. Meanwhile, enable the graph mode for LoRA operators to enter ACL, so as to improve the model inference performance. ### Does this PR introduce _any_ user-facing change? no user-facing change ### How was this patch tested? Based on the actual test of the QWen2.5 7B model using vllm-ascend version v0.9.2.rc1, in acl graph mode, the TTFT, TPOT and throughput have increased by about 100%. Signed-off-by: liuchn <909698896@qq.com> - vLLM version: v0.10.0 - vLLM main: 1f83e7d849 --------- Signed-off-by: liuchn <909698896@qq.com> Co-authored-by: liuchn <909698896@qq.com>
2025-08-19 09:09:11 +08:00
parent 8fb50a4248
commit 3648d18e67
8 changed files with 847 additions and 29 deletions
--- a/vllm_ascend/meta_registration.py
+++ b/vllm_ascend/meta_registration.py
@@ -80,7 +80,30 @@ def get_masked_input_and_mask_meta(input: torch.Tensor,

    return masked_input, mask

+def bgmv_expand_meta(x: torch.Tensor,
+                                   weight: torch.Tensor,
+                                   indices: torch.Tensor,
+                                   y: torch.Tensor,
+                                   slice_offset: int,
+                                   slice_size: int):
+
+    y_out = torch.empty_like(y)
+    return y_out
+
+def sgmv_expand_meta(x: torch.Tensor,
+                                   weight: torch.Tensor,
+                                   lora_indices: torch.Tensor,
+                                   seq_len: torch.Tensor,
+                                   y: torch.Tensor,
+                                   slice_offset: int,
+                                   slice_size: int):
+
+    y_out = torch.empty_like(y)
+    return y_out
+

 register_meta_if_necessary("_C", "rotary_embedding", rotary_embedding_meta)
 register_meta_if_necessary("_C", "get_masked_input_and_mask",
                           get_masked_input_and_mask_meta)
+register_meta_if_necessary("_C", "bgmv_expand", bgmv_expand_meta)
+register_meta_if_necessary("_C", "sgmv_expand", sgmv_expand_meta)