xc-llm-ascend

Files

taoxudonghaha 540336edc9 Add Custom Kernels For LoRA Performance (#1884 )

### What this PR does / why we need it?
Add two custom kernels(bgmv_shrink and bgmv expand) to solve the
performance of LoRA
### Does this PR introduce _any_ user-facing change?
no user-facing change
### How was this patch tested?
we add Unit Test file to test the custom ascendc kernel. See
vllm-ascend/tests/e2e/singlecard/ops/test_bgmv_expand.py and
vllm-ascend/tests/e2e/singlecard/ops/test_bgmv_expand.py
Based on the actual test of the QWen2.5 7B model using vllm-ascend
version v0.9.2.rc1, the TTFT, TPOT and throughput have increased by
about 70%.

- vLLM version: v0.9.2
- vLLM main:
40d86ee412

---------

Signed-off-by: taoxudonghaha <justsheldon@163.com>

2025-07-29 19:27:50 +08:00

kernels

Add Custom Kernels For LoRA Performance (#1884 )

2025-07-29 19:27:50 +08:00

camem_allocator.cpp

Add sleep mode feature for Ascend NPU (#513 )

2025-04-18 13:11:39 +08:00

ops.h

Add Custom Kernels For LoRA Performance (#1884 )

2025-07-29 19:27:50 +08:00

torch_binding.cpp

Add Custom Kernels For LoRA Performance (#1884 )

2025-07-29 19:27:50 +08:00

utils.h

[core] Support custom ascendc kernels in vllm-ascend (#233 )

2025-04-03 14:52:34 +08:00