sglang

Author	SHA1	Message	Date
Lifu Huang	3f41b48c40	[2/2] Introduce Chunked-SGMV kernels and corresponding LoRA backend for improved performance (#10286 )	2025-09-15 16:04:03 -07:00
Lifu Huang	941002945b	[1/2] Refactor LoRA to support backend-specific batch preprocessing. (#10251 )	2025-09-10 09:58:37 -07:00
Lifu Huang	f8a173bb50	Improve LoRA Perf by Deprecating FlashInfer and Eliminating Redundant Tensor Ops (#8940 )	2025-08-10 01:04:45 -07:00
Baizhou Zhang	8cddfa56a1	Clean warning logs for gate_proj loading in Lora (#8172 )	2025-07-19 15:56:50 -07:00
Lifu Huang	49538d111b	Support dynamic LoRA loading / unloading in engine/server API (#7446 )	2025-06-27 21:00:27 -07:00
Lifu Huang	4474eaf552	Support LoRA in TestOpenAIVisionServer and fix fused kv_proj loading bug. (#6861 )	2025-06-04 22:08:30 -07:00
Lifu Huang	094fbdacd5	Fix incorrect LoRA weight loading for fused gate_up_proj (#6734 )	2025-05-31 13:41:44 -07:00
Lifu Huang	477a101cbd	Refactor LoRA handling to support adapter tensors in fused format (#6585 )	2025-05-26 21:51:54 -07:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
Lianmin Zheng	177320a582	Clean up imports (#5467 )	2025-04-16 15:26:49 -07:00
aoshen524	588865f0e0	[Feature] Support Tensor Parallelism and Weight Slicing for Lora (#4274 ) Co-authored-by: ShenAo1111 <1377693092@qq.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-03-18 20:33:07 -07:00
aoshen524	e79f7420be	[Fix] Fix bugs and refactor codes in lora for better scalability. (#3652 ) Co-authored-by: ShenAo1111 <1377693092@qq.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-20 11:51:57 -08:00
Baizhou Zhang	c45cab1c00	[Fix] Fix accuracy bug and refactor codes for lora (#3413 )	2025-02-10 13:29:00 +08:00
Baizhou Zhang	70817a7eae	[Feature] Define backends and add Triton backend for Lora (#3161 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-03 22:09:13 -08:00
Yineng Zhang	959dca4fc7	use srt VocabParallelEmbedding (#3252 )	2025-02-01 22:23:09 +08:00
Yineng Zhang	033c715b46	cleanup models dependencies 1/n (#2948 )	2025-01-17 23:46:48 +08:00
Yineng Zhang	85e1a6f3aa	Update model_loader deps and qqq quantization deps (#2220 ) (#2318 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-02 23:22:13 +08:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
Yineng Zhang	8bee20f80b	Update vllm to 0.6.3 (#1711 ) (#1720 ) Co-authored-by: Ke Bao <ISPObaoke@163.com>	2024-10-19 20:45:41 -07:00
Ying Sheng	9c064bf78a	[LoRA, Performance] Speedup multi-LoRA serving - Step 1 (#1587 )	2024-10-06 10:33:44 -07:00
Ying Sheng	0f4fb19bc8	[Fix, LoRA] fix LoRA with updates in main (#1545 )	2024-09-30 10:06:08 -07:00
Lianmin Zheng	36d5acfca5	Rename InputMetadata -> ForwardBatch (#1543 )	2024-09-30 02:41:11 -07:00
Ying Sheng	712216928f	[Feature] Initial support for multi-LoRA serving (#1307 )	2024-09-12 16:46:14 -07:00

24 Commits