Lifu Huang
|
df90645525
|
Support overlapped lora updates (#8213)
|
2025-07-27 13:00:44 -07:00 |
|
Lifu Huang
|
761546315c
|
Remove slot usage in code to be backward-compatible with python 3.9 (#8396)
|
2025-07-26 21:24:22 -07:00 |
|
Lifu Huang
|
8abd3e77fe
|
Introduce Stable LoRA ID System for Overlapped Updates and Prefix Caching (#8261)
|
2025-07-23 00:32:16 -07:00 |
|
Baizhou Zhang
|
8cddfa56a1
|
Clean warning logs for gate_proj loading in Lora (#8172)
|
2025-07-19 15:56:50 -07:00 |
|
Lifu Huang
|
4e3defe5a7
|
Support start up LoRA server without initial adapters (#8019)
|
2025-07-19 15:38:09 -07:00 |
|
Lifu Huang
|
3de617a75b
|
Fix LoRA buffer contamination during adapter eviction (#8103)
|
2025-07-19 13:14:08 -07:00 |
|
Lifu Huang
|
e2ed9d049a
|
Refactor dynamic LoRA update to fix incorrect handling of variant weight shapes (#7844)
|
2025-07-13 18:36:01 -07:00 |
|
Lifu Huang
|
ea4bf12286
|
Fix division-by-zero bug in LoRA triton kernels. (#7785)
|
2025-07-06 00:45:29 -07:00 |
|
Lifu Huang
|
49538d111b
|
Support dynamic LoRA loading / unloading in engine/server API (#7446)
|
2025-06-27 21:00:27 -07:00 |
|
Lifu Huang
|
1998ce4046
|
Refactor LoRAManager and LoRAMemoryPool state management logic for dynamic LoRA loading support (#7412)
|
2025-06-21 16:09:19 -07:00 |
|
Lifu Huang
|
021f76e4f4
|
[Perf] Refactor LoRAManager to eliminate stream syncs and redundant computations (#6994)
|
2025-06-11 16:18:57 -07:00 |
|
Lifu Huang
|
b1e5a33ae3
|
Eliminate stream sync to speed up LoRA batch init (#6960)
|
2025-06-09 00:22:45 -07:00 |
|
Lifu Huang
|
4474eaf552
|
Support LoRA in TestOpenAIVisionServer and fix fused kv_proj loading bug. (#6861)
|
2025-06-04 22:08:30 -07:00 |
|
Lifu Huang
|
094fbdacd5
|
Fix incorrect LoRA weight loading for fused gate_up_proj (#6734)
|
2025-05-31 13:41:44 -07:00 |
|
Lifu Huang
|
477a101cbd
|
Refactor LoRA handling to support adapter tensors in fused format (#6585)
|
2025-05-26 21:51:54 -07:00 |
|
Qiaolin Yu
|
cd8d4b9dfc
|
Fix lora bench (#6302)
|
2025-05-15 10:09:55 -07:00 |
|
Lianmin Zheng
|
e8e18dcdcc
|
Revert "fix some typos" (#6244)
|
2025-05-12 12:53:26 -07:00 |
|
applesaucethebun
|
d738ab52f8
|
fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-13 01:42:38 +08:00 |
|
applesaucethebun
|
2ce8793519
|
Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 12:55:00 +08:00 |
|
Qiaolin Yu
|
3042f1da61
|
Fix flaky issues of lora and add multi batch tests (#5957)
|
2025-05-04 13:11:40 -07:00 |
|
Qiaolin Yu
|
7bcd8b1cb2
|
Fix lora batch processing when input lora_path contains None (#5930)
|
2025-04-30 19:42:42 -07:00 |
|
Qiaolin Yu
|
8c0cfca87d
|
Feat: support cuda graph for LoRA (#4115)
Co-authored-by: Beichen Ma <mabeichen12@gmail.com>
|
2025-04-28 23:30:44 -07:00 |
|
Lianmin Zheng
|
177320a582
|
Clean up imports (#5467)
|
2025-04-16 15:26:49 -07:00 |
|
chaobo jia
|
ef9a378a20
|
[Feature] add multi-rank support for Lora (#4492)
Co-authored-by: rudy152 <czh1137892874@gmail.com>
|
2025-03-28 09:38:44 -07:00 |
|
Qiaolin Yu
|
9fdc6d6abc
|
Fix the lora adapter when lora path is none (#4799)
Co-authored-by: Beichen Ma <mabeichen12@gmail.com>
|
2025-03-27 21:03:08 -07:00 |
|
aoshen524
|
588865f0e0
|
[Feature] Support Tensor Parallelism and Weight Slicing for Lora (#4274)
Co-authored-by: ShenAo1111 <1377693092@qq.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-03-18 20:33:07 -07:00 |
|
Lianmin Zheng
|
1361ab9e03
|
Lazily import lora backends (#4225)
|
2025-03-08 23:39:26 -08:00 |
|
aoshen524
|
e79f7420be
|
[Fix] Fix bugs and refactor codes in lora for better scalability. (#3652)
Co-authored-by: ShenAo1111 <1377693092@qq.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-20 11:51:57 -08:00 |
|
Baizhou Zhang
|
c45cab1c00
|
[Fix] Fix accuracy bug and refactor codes for lora (#3413)
|
2025-02-10 13:29:00 +08:00 |
|
Baizhou Zhang
|
76fa2d152c
|
Fix lora flashinfer import bug on ROCM (#3312)
|
2025-02-05 16:36:49 +08:00 |
|
Baizhou Zhang
|
70817a7eae
|
[Feature] Define backends and add Triton backend for Lora (#3161)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2025-02-03 22:09:13 -08:00 |
|
Yineng Zhang
|
959dca4fc7
|
use srt VocabParallelEmbedding (#3252)
|
2025-02-01 22:23:09 +08:00 |
|
Yineng Zhang
|
033c715b46
|
cleanup models dependencies 1/n (#2948)
|
2025-01-17 23:46:48 +08:00 |
|
Yineng Zhang
|
85e1a6f3aa
|
Update model_loader deps and qqq quantization deps (#2220) (#2318)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2024-12-02 23:22:13 +08:00 |
|
Xuehai Pan
|
62a4a339eb
|
docs: fix module docstrings and copyright headers (#2077)
|
2024-11-22 22:16:53 +08:00 |
|
Yineng Zhang
|
8bee20f80b
|
Update vllm to 0.6.3 (#1711) (#1720)
Co-authored-by: Ke Bao <ISPObaoke@163.com>
|
2024-10-19 20:45:41 -07:00 |
|
Lianmin Zheng
|
6a5b352aaf
|
Use is_flashinfer_available to replace is_hip for flashinfer check (#1596)
Co-authored-by: Zhang Liangang <liangang.zhang@intel.com>
|
2024-10-06 22:54:05 -07:00 |
|
Ying Sheng
|
9c064bf78a
|
[LoRA, Performance] Speedup multi-LoRA serving - Step 1 (#1587)
|
2024-10-06 10:33:44 -07:00 |
|
Minsang Song
|
e6852b0dd2
|
[Fix] Fix AttributeError in Qwen2.5 LoRA: 'Qwen2ForCausalLM' object has no attribute 'get_hidden_dim' (#1536)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-10-02 20:41:15 -07:00 |
|
Ying Sheng
|
0f4fb19bc8
|
[Fix, LoRA] fix LoRA with updates in main (#1545)
|
2024-09-30 10:06:08 -07:00 |
|
Lianmin Zheng
|
36d5acfca5
|
Rename InputMetadata -> ForwardBatch (#1543)
|
2024-09-30 02:41:11 -07:00 |
|
Lianmin Zheng
|
3f0fe08d37
|
Let ModelRunner take InputMetadata as input, instead of ScheduleBatch (#1541)
|
2024-09-29 20:28:45 -07:00 |
|
HAI
|
3a6e04185b
|
[Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1420)
|
2024-09-17 07:43:52 +00:00 |
|
Ying Sheng
|
37963394aa
|
[Feature] Support LoRA path renaming and add LoRA serving benchmarks (#1433)
|
2024-09-15 12:46:04 -07:00 |
|
Ying Sheng
|
712216928f
|
[Feature] Initial support for multi-LoRA serving (#1307)
|
2024-09-12 16:46:14 -07:00 |
|