xc-llm-ascend

Files

liuchenbing 3648d18e67 Add Custom Kernels For LoRA Performance (#2325 )

### What this PR does / why we need it?
Add two custom operators (sgmv_shrink and sgmv_expand) to address the
performance issues of LoRA. Meanwhile, enable the graph mode for LoRA
operators to enter ACL, so as to improve the model inference
performance.
### Does this PR introduce _any_ user-facing change?
      no user-facing change
### How was this patch tested?
Based on the actual test of the QWen2.5 7B model using vllm-ascend
version v0.9.2.rc1, in acl graph mode, the TTFT, TPOT and throughput
have increased by about 100%.

Signed-off-by: liuchn <909698896@qq.com>

- vLLM version: v0.10.0
- vLLM main:
1f83e7d849

---------

Signed-off-by: liuchn <909698896@qq.com>
Co-authored-by: liuchn <909698896@qq.com>

2025-08-19 09:09:11 +08:00

kernels

Add Custom Kernels For LoRA Performance (#2325 )

2025-08-19 09:09:11 +08:00

camem_allocator.cpp

Add sleep mode feature for Ascend NPU (#513 )

2025-04-18 13:11:39 +08:00

ops.h

Add Custom Kernels For LoRA Performance (#2325 )

2025-08-19 09:09:11 +08:00

torch_binding_meta.cpp

Add Custom Kernels For LoRA Performance (#2325 )

2025-08-19 09:09:11 +08:00

torch_binding.cpp

Add Custom Kernels For LoRA Performance (#2325 )

2025-08-19 09:09:11 +08:00

utils.h

[core] Support capture custom ops into aclgraph (#2113 )

2025-08-11 15:59:42 +08:00