Commit Graph

23 Commits

Author SHA1 Message Date
Lifu Huang
8abd3e77fe Introduce Stable LoRA ID System for Overlapped Updates and Prefix Caching (#8261) 2025-07-23 00:32:16 -07:00
Pavel Logachev
877e35d775 Add get_hidden_dim to qwen3.py for correct lora (#7312) 2025-07-19 19:31:16 -07:00
Lifu Huang
4e3defe5a7 Support start up LoRA server without initial adapters (#8019) 2025-07-19 15:38:09 -07:00
Lifu Huang
3de617a75b Fix LoRA buffer contamination during adapter eviction (#8103) 2025-07-19 13:14:08 -07:00
Lifu Huang
e2ed9d049a Refactor dynamic LoRA update to fix incorrect handling of variant weight shapes (#7844) 2025-07-13 18:36:01 -07:00
Lifu Huang
01f9873048 Fix CI test OOM issue. (#7799) 2025-07-05 15:11:02 -07:00
Lifu Huang
1a08358aed Improve error handling for requests with unloaded LoRA path(s) (#7642) 2025-07-01 20:05:34 -07:00
Lifu Huang
49538d111b Support dynamic LoRA loading / unloading in engine/server API (#7446) 2025-06-27 21:00:27 -07:00
Lifu Huang
2373faa317 Fix flakiness in LoRA batch test. (#7552) 2025-06-27 19:51:43 -07:00
Baizhou Zhang
3b014bc13d Fix test_lora.py CI (#7061) 2025-06-10 12:24:46 -07:00
Lianmin Zheng
e8e18dcdcc Revert "fix some typos" (#6244) 2025-05-12 12:53:26 -07:00
applesaucethebun
d738ab52f8 fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-13 01:42:38 +08:00
Qiaolin Yu
3042f1da61 Fix flaky issues of lora and add multi batch tests (#5957) 2025-05-04 13:11:40 -07:00
Qiaolin Yu
7bcd8b1cb2 Fix lora batch processing when input lora_path contains None (#5930) 2025-04-30 19:42:42 -07:00
Qiaolin Yu
8c0cfca87d Feat: support cuda graph for LoRA (#4115)
Co-authored-by: Beichen Ma <mabeichen12@gmail.com>
2025-04-28 23:30:44 -07:00
Baizhou Zhang
42873eac09 [Fix] Improve Lora tests and reduce CI runtime (#4925) 2025-03-30 19:40:14 -07:00
Lianmin Zheng
4ede6770cd Fix retract for page size > 1 (#4914) 2025-03-30 02:57:15 -07:00
chaobo jia
ef9a378a20 [Feature] add multi-rank support for Lora (#4492)
Co-authored-by: rudy152 <czh1137892874@gmail.com>
2025-03-28 09:38:44 -07:00
Qiaolin Yu
9fdc6d6abc Fix the lora adapter when lora path is none (#4799)
Co-authored-by: Beichen Ma <mabeichen12@gmail.com>
2025-03-27 21:03:08 -07:00
fzyzcjy
15ddd84322 Add retry for flaky tests in CI (#4755) 2025-03-25 16:53:12 -07:00
aoshen524
588865f0e0 [Feature] Support Tensor Parallelism and Weight Slicing for Lora (#4274)
Co-authored-by: ShenAo1111 <1377693092@qq.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-03-18 20:33:07 -07:00
Lianmin Zheng
77a3954bf7 Simplify eagle tests and TP sync in grammar backend (#4066) 2025-03-04 13:40:40 -08:00
aoshen524
e79f7420be [Fix] Fix bugs and refactor codes in lora for better scalability. (#3652)
Co-authored-by: ShenAo1111 <1377693092@qq.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-20 11:51:57 -08:00