[Bugfix][LoRA][Operator] Fix LoRA custom operators accuracy issue (#2672)

### What this PR does / why we need it? Fix the LoRA accuracy issue that introduced by custom AscendC operator "bgmv_shrink, sgmv_shrink, bgmv_expand, sgmv_epand". The bug details are: - In the kernel function, if you want to call GlobalTensor.GetSize method, you have to pass the second parameter of bufferSize when you call GlobalTensor.SetGlobalBuffer first. - Or GlobalTensor.GetSize method will return a random value. - You can refer to [this doc](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1alpha002/apiref/ascendcopapi/atlasascendc_api_07_00024.html). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? pytest -sv tests/e2e/singlecard/test_ilama_lora.py pytest -sv tests/e2e/multicard/test_ilama_lora_tp2.py - vLLM version: v0.10.1.1 - vLLM main: a344a5aa0a --------- Signed-off-by: paulyu12 <paulyu0307@gmail.com> Signed-off-by: paulyu12 <507435917@qq.com> Co-authored-by: paulyu12 <paulyu0307@gmail.com>
2025-09-02 11:46:59 +08:00
parent 214b32a346
commit 9f1e054fe3
9 changed files with 99 additions and 41 deletions
--- a/csrc/ops.h
+++ b/csrc/ops.h
@@ -67,6 +67,7 @@ namespace vllm_ascend {
        void *x,
        void *weight,
        void *indices,
+        uint32_t indicesSize,
        void *y, 
        uint32_t batch_size,
        uint32_t num_tokens_per_core,
@@ -80,6 +81,7 @@ namespace vllm_ascend {
        void *x,
        void *weight,
        void *indices,
+        uint32_t indicesSize,
        void *y,
        void *y_out,
        uint32_t batch_size,
@@ -95,7 +97,9 @@ namespace vllm_ascend {
        void *x,
        void *weight,
        void *loraIndices,
+        uint32_t loraIndicesSize,
        void *seqLen,
+        uint32_t seqLenSize,
        void *y,
        uint32_t batch_size,
        uint32_t num_tokens_per_core,
@@ -109,7 +113,9 @@ namespace vllm_ascend {
        void *x,
        void *weight,
        void *loraIndices,
+        uint32_t loraIndicesSize,
        void *seqLen,
+        uint32_t seqLenSize,
        void *y,
        void *y_out,
        uint32_t batch_size,