[Bugfix][LoRA][Operator] Fix LoRA custom operators accuracy issue (#2672)
### What this PR does / why we need it?
Fix the LoRA accuracy issue that introduced by custom AscendC operator
"bgmv_shrink, sgmv_shrink, bgmv_expand, sgmv_epand".
The bug details are:
- In the kernel function, if you want to call GlobalTensor.GetSize
method, you have to pass the second parameter of bufferSize when you
call GlobalTensor.SetGlobalBuffer first.
- Or GlobalTensor.GetSize method will return a random value.
- You can refer to [this
doc](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1alpha002/apiref/ascendcopapi/atlasascendc_api_07_00024.html).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
pytest -sv tests/e2e/singlecard/test_ilama_lora.py
pytest -sv tests/e2e/multicard/test_ilama_lora_tp2.py
- vLLM version: v0.10.1.1
- vLLM main:
a344a5aa0a
---------
Signed-off-by: paulyu12 <paulyu0307@gmail.com>
Signed-off-by: paulyu12 <507435917@qq.com>
Co-authored-by: paulyu12 <paulyu0307@gmail.com>
This commit is contained in:
@@ -67,6 +67,7 @@ namespace vllm_ascend {
|
||||
void *x,
|
||||
void *weight,
|
||||
void *indices,
|
||||
uint32_t indicesSize,
|
||||
void *y,
|
||||
uint32_t batch_size,
|
||||
uint32_t num_tokens_per_core,
|
||||
@@ -80,6 +81,7 @@ namespace vllm_ascend {
|
||||
void *x,
|
||||
void *weight,
|
||||
void *indices,
|
||||
uint32_t indicesSize,
|
||||
void *y,
|
||||
void *y_out,
|
||||
uint32_t batch_size,
|
||||
@@ -95,7 +97,9 @@ namespace vllm_ascend {
|
||||
void *x,
|
||||
void *weight,
|
||||
void *loraIndices,
|
||||
uint32_t loraIndicesSize,
|
||||
void *seqLen,
|
||||
uint32_t seqLenSize,
|
||||
void *y,
|
||||
uint32_t batch_size,
|
||||
uint32_t num_tokens_per_core,
|
||||
@@ -109,7 +113,9 @@ namespace vllm_ascend {
|
||||
void *x,
|
||||
void *weight,
|
||||
void *loraIndices,
|
||||
uint32_t loraIndicesSize,
|
||||
void *seqLen,
|
||||
uint32_t seqLenSize,
|
||||
void *y,
|
||||
void *y_out,
|
||||
uint32_t batch_size,
|
||||
|
||||
Reference in New Issue
Block a user