[Bugfix][LoRA][Operator] Fix LoRA custom operators accuracy issue (#2672)

### What this PR does / why we need it? Fix the LoRA accuracy issue that introduced by custom AscendC operator "bgmv_shrink, sgmv_shrink, bgmv_expand, sgmv_epand". The bug details are: - In the kernel function, if you want to call GlobalTensor.GetSize method, you have to pass the second parameter of bufferSize when you call GlobalTensor.SetGlobalBuffer first. - Or GlobalTensor.GetSize method will return a random value. - You can refer to [this doc](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1alpha002/apiref/ascendcopapi/atlasascendc_api_07_00024.html). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? pytest -sv tests/e2e/singlecard/test_ilama_lora.py pytest -sv tests/e2e/multicard/test_ilama_lora_tp2.py - vLLM version: v0.10.1.1 - vLLM main: a344a5aa0a --------- Signed-off-by: paulyu12 <paulyu0307@gmail.com> Signed-off-by: paulyu12 <507435917@qq.com> Co-authored-by: paulyu12 <paulyu0307@gmail.com>
2025-09-02 11:46:59 +08:00
parent 214b32a346
commit 9f1e054fe3
9 changed files with 99 additions and 41 deletions
--- a/.github/workflows/vllm_ascend_test.yaml
+++ b/.github/workflows/vllm_ascend_test.yaml
@@ -201,7 +201,7 @@ jobs:
          pytest -sv tests/e2e/singlecard/test_embedding.py
          pytest -sv tests/e2e/singlecard/test_guided_decoding.py
          # TODO: Fix lora accuracy error
-          # pytest -sv tests/e2e/singlecard/test_ilama_lora.py
+          pytest -sv tests/e2e/singlecard/test_ilama_lora.py
          pytest -sv tests/e2e/singlecard/test_profile_execute_duration.py
          pytest -sv tests/e2e/singlecard/test_quantization.py
          pytest -sv tests/e2e/singlecard/test_sampler.py
@@ -280,7 +280,7 @@ jobs:
          # external_launcher test is not stable enough. Fix it later
          # pytest -sv tests/e2e/multicard/test_external_launcher.py
          pytest -sv tests/e2e/multicard/test_fused_moe_allgather_ep.py
-          # pytest -sv tests/e2e/multicard/test_ilama_lora_tp2.py
+          pytest -sv tests/e2e/multicard/test_ilama_lora_tp2.py

          # To avoid oom, we need to run the test in a single process.
          pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_QwQ