Lifu Huang
|
9241f4fd20
|
Move cached kernel to srt.utils (#10776)
|
2025-09-22 23:00:36 -07:00 |
|
Lifu Huang
|
635ccda673
|
[4/4] Introduce CachedKernel to reduce CSGMV kernel launch overheads by 60% (#10709)
|
2025-09-21 22:26:42 -07:00 |
|
Lifu Huang
|
08ecd0aa2a
|
[3/4] Speed up CSGMV backend perf by 10% through dynamic chunking + kernel optimization (#10592)
|
2025-09-20 22:47:48 -07:00 |
|
gongwei-130
|
373080ea6c
|
skip vision_model for lora (#10530)
|
2025-09-16 12:34:42 -07:00 |
|
Lifu Huang
|
3f41b48c40
|
[2/2] Introduce Chunked-SGMV kernels and corresponding LoRA backend for improved performance (#10286)
|
2025-09-15 16:04:03 -07:00 |
|
Lifu Huang
|
941002945b
|
[1/2] Refactor LoRA to support backend-specific batch preprocessing. (#10251)
|
2025-09-10 09:58:37 -07:00 |
|
gongwei-130
|
ab62b135c1
|
support Llama4 with non uniformed intermediate size across layers for… (#10047)
|
2025-09-05 17:28:15 -07:00 |
|
Beichen Ma
|
dd6ec02965
|
Add target module validation for init adapters (#9429)
|
2025-08-24 20:24:50 -07:00 |
|
Lifu Huang
|
b0980af89f
|
Support pinning adapter via server args. (#9249)
|
2025-08-20 16:25:01 -07:00 |
|
Lifu Huang
|
4b74c3fcca
|
[chore] Clean up redundant lora_weight_names concept to simplify code (#9131)
|
2025-08-17 12:36:58 -07:00 |
|
wxzhoucs
|
4c22897a66
|
Feature: support qwen and llama4 reducescatter for dp attention padding (#9101)
|
2025-08-13 21:10:29 -07:00 |
|
Lifu Huang
|
5ded39cab2
|
Fix race condition in async lora unload (#9084)
|
2025-08-11 22:59:29 -07:00 |
|
Lifu Huang
|
f8a173bb50
|
Improve LoRA Perf by Deprecating FlashInfer and Eliminating Redundant Tensor Ops (#8940)
|
2025-08-10 01:04:45 -07:00 |
|
Lifu Huang
|
6e2151183b
|
Fix incorrect default get_hidden_dim logic (#8987)
|
2025-08-09 00:25:38 -07:00 |
|
Lifu Huang
|
6210e2c4f0
|
Support GPU pinning for LoRA (#8697)
|
2025-08-06 19:39:45 -07:00 |
|
Lifu Huang
|
7cb20754fa
|
[Fix] Fix several issues preventing gemma3n LoRA support. (#8776)
|
2025-08-04 17:11:46 -07:00 |
|
Baizhou Zhang
|
f2d68ded6d
|
Rename lora_path to lora_id in batches (#8437)
|
2025-08-03 21:08:28 -07:00 |
|
Lifu Huang
|
8675bdf246
|
Support limiting max loaded loras in CPU. (#8650)
|
2025-08-03 00:02:23 -07:00 |
|
Lifu Huang
|
df90645525
|
Support overlapped lora updates (#8213)
|
2025-07-27 13:00:44 -07:00 |
|
Lifu Huang
|
761546315c
|
Remove slot usage in code to be backward-compatible with python 3.9 (#8396)
|
2025-07-26 21:24:22 -07:00 |
|
Lifu Huang
|
8abd3e77fe
|
Introduce Stable LoRA ID System for Overlapped Updates and Prefix Caching (#8261)
|
2025-07-23 00:32:16 -07:00 |
|
Baizhou Zhang
|
8cddfa56a1
|
Clean warning logs for gate_proj loading in Lora (#8172)
|
2025-07-19 15:56:50 -07:00 |
|
Lifu Huang
|
4e3defe5a7
|
Support start up LoRA server without initial adapters (#8019)
|
2025-07-19 15:38:09 -07:00 |
|
Lifu Huang
|
3de617a75b
|
Fix LoRA buffer contamination during adapter eviction (#8103)
|
2025-07-19 13:14:08 -07:00 |
|
Lifu Huang
|
e2ed9d049a
|
Refactor dynamic LoRA update to fix incorrect handling of variant weight shapes (#7844)
|
2025-07-13 18:36:01 -07:00 |
|
Lifu Huang
|
ea4bf12286
|
Fix division-by-zero bug in LoRA triton kernels. (#7785)
|
2025-07-06 00:45:29 -07:00 |
|
Lifu Huang
|
49538d111b
|
Support dynamic LoRA loading / unloading in engine/server API (#7446)
|
2025-06-27 21:00:27 -07:00 |
|
Lifu Huang
|
1998ce4046
|
Refactor LoRAManager and LoRAMemoryPool state management logic for dynamic LoRA loading support (#7412)
|
2025-06-21 16:09:19 -07:00 |
|
Lifu Huang
|
021f76e4f4
|
[Perf] Refactor LoRAManager to eliminate stream syncs and redundant computations (#6994)
|
2025-06-11 16:18:57 -07:00 |
|
Lifu Huang
|
b1e5a33ae3
|
Eliminate stream sync to speed up LoRA batch init (#6960)
|
2025-06-09 00:22:45 -07:00 |
|
Lifu Huang
|
4474eaf552
|
Support LoRA in TestOpenAIVisionServer and fix fused kv_proj loading bug. (#6861)
|
2025-06-04 22:08:30 -07:00 |
|
Lifu Huang
|
094fbdacd5
|
Fix incorrect LoRA weight loading for fused gate_up_proj (#6734)
|
2025-05-31 13:41:44 -07:00 |
|
Lifu Huang
|
477a101cbd
|
Refactor LoRA handling to support adapter tensors in fused format (#6585)
|
2025-05-26 21:51:54 -07:00 |
|
Qiaolin Yu
|
cd8d4b9dfc
|
Fix lora bench (#6302)
|
2025-05-15 10:09:55 -07:00 |
|
Lianmin Zheng
|
e8e18dcdcc
|
Revert "fix some typos" (#6244)
|
2025-05-12 12:53:26 -07:00 |
|
applesaucethebun
|
d738ab52f8
|
fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-13 01:42:38 +08:00 |
|
applesaucethebun
|
2ce8793519
|
Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 12:55:00 +08:00 |
|
Qiaolin Yu
|
3042f1da61
|
Fix flaky issues of lora and add multi batch tests (#5957)
|
2025-05-04 13:11:40 -07:00 |
|
Qiaolin Yu
|
7bcd8b1cb2
|
Fix lora batch processing when input lora_path contains None (#5930)
|
2025-04-30 19:42:42 -07:00 |
|
Qiaolin Yu
|
8c0cfca87d
|
Feat: support cuda graph for LoRA (#4115)
Co-authored-by: Beichen Ma <mabeichen12@gmail.com>
|
2025-04-28 23:30:44 -07:00 |
|
Lianmin Zheng
|
177320a582
|
Clean up imports (#5467)
|
2025-04-16 15:26:49 -07:00 |
|
chaobo jia
|
ef9a378a20
|
[Feature] add multi-rank support for Lora (#4492)
Co-authored-by: rudy152 <czh1137892874@gmail.com>
|
2025-03-28 09:38:44 -07:00 |
|
Qiaolin Yu
|
9fdc6d6abc
|
Fix the lora adapter when lora path is none (#4799)
Co-authored-by: Beichen Ma <mabeichen12@gmail.com>
|
2025-03-27 21:03:08 -07:00 |
|
aoshen524
|
588865f0e0
|
[Feature] Support Tensor Parallelism and Weight Slicing for Lora (#4274)
Co-authored-by: ShenAo1111 <1377693092@qq.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-03-18 20:33:07 -07:00 |
|
Lianmin Zheng
|
1361ab9e03
|
Lazily import lora backends (#4225)
|
2025-03-08 23:39:26 -08:00 |
|
aoshen524
|
e79f7420be
|
[Fix] Fix bugs and refactor codes in lora for better scalability. (#3652)
Co-authored-by: ShenAo1111 <1377693092@qq.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-20 11:51:57 -08:00 |
|
Baizhou Zhang
|
c45cab1c00
|
[Fix] Fix accuracy bug and refactor codes for lora (#3413)
|
2025-02-10 13:29:00 +08:00 |
|
Baizhou Zhang
|
76fa2d152c
|
Fix lora flashinfer import bug on ROCM (#3312)
|
2025-02-05 16:36:49 +08:00 |
|
Baizhou Zhang
|
70817a7eae
|
[Feature] Define backends and add Triton backend for Lora (#3161)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2025-02-03 22:09:13 -08:00 |
|
Yineng Zhang
|
959dca4fc7
|
use srt VocabParallelEmbedding (#3252)
|
2025-02-01 22:23:09 +08:00 |
|