sglang

Author	SHA1	Message	Date
Lifu Huang	9241f4fd20	Move cached kernel to srt.utils (#10776 )	2025-09-22 23:00:36 -07:00
Lifu Huang	635ccda673	[4/4] Introduce CachedKernel to reduce CSGMV kernel launch overheads by 60% (#10709 )	2025-09-21 22:26:42 -07:00
Lifu Huang	08ecd0aa2a	[3/4] Speed up CSGMV backend perf by 10% through dynamic chunking + kernel optimization (#10592 )	2025-09-20 22:47:48 -07:00
gongwei-130	373080ea6c	skip vision_model for lora (#10530 )	2025-09-16 12:34:42 -07:00
Lifu Huang	3f41b48c40	[2/2] Introduce Chunked-SGMV kernels and corresponding LoRA backend for improved performance (#10286 )	2025-09-15 16:04:03 -07:00
Lifu Huang	941002945b	[1/2] Refactor LoRA to support backend-specific batch preprocessing. (#10251 )	2025-09-10 09:58:37 -07:00
gongwei-130	ab62b135c1	support Llama4 with non uniformed intermediate size across layers for… (#10047 )	2025-09-05 17:28:15 -07:00
Beichen Ma	dd6ec02965	Add target module validation for init adapters (#9429 )	2025-08-24 20:24:50 -07:00
Lifu Huang	b0980af89f	Support pinning adapter via server args. (#9249 )	2025-08-20 16:25:01 -07:00
Lifu Huang	4b74c3fcca	[chore] Clean up redundant lora_weight_names concept to simplify code (#9131 )	2025-08-17 12:36:58 -07:00
wxzhoucs	4c22897a66	Feature: support qwen and llama4 reducescatter for dp attention padding (#9101 )	2025-08-13 21:10:29 -07:00
Lifu Huang	5ded39cab2	Fix race condition in async lora unload (#9084 )	2025-08-11 22:59:29 -07:00
Lifu Huang	f8a173bb50	Improve LoRA Perf by Deprecating FlashInfer and Eliminating Redundant Tensor Ops (#8940 )	2025-08-10 01:04:45 -07:00
Lifu Huang	6e2151183b	Fix incorrect default get_hidden_dim logic (#8987 )	2025-08-09 00:25:38 -07:00
Lifu Huang	6210e2c4f0	Support GPU pinning for LoRA (#8697 )	2025-08-06 19:39:45 -07:00
Lifu Huang	7cb20754fa	[Fix] Fix several issues preventing gemma3n LoRA support. (#8776 )	2025-08-04 17:11:46 -07:00
Baizhou Zhang	f2d68ded6d	Rename lora_path to lora_id in batches (#8437 )	2025-08-03 21:08:28 -07:00
Lifu Huang	8675bdf246	Support limiting max loaded loras in CPU. (#8650 )	2025-08-03 00:02:23 -07:00
Lifu Huang	df90645525	Support overlapped lora updates (#8213 )	2025-07-27 13:00:44 -07:00
Lifu Huang	761546315c	Remove slot usage in code to be backward-compatible with python 3.9 (#8396 )	2025-07-26 21:24:22 -07:00
Lifu Huang	8abd3e77fe	Introduce Stable LoRA ID System for Overlapped Updates and Prefix Caching (#8261 )	2025-07-23 00:32:16 -07:00
Baizhou Zhang	8cddfa56a1	Clean warning logs for gate_proj loading in Lora (#8172 )	2025-07-19 15:56:50 -07:00
Lifu Huang	4e3defe5a7	Support start up LoRA server without initial adapters (#8019 )	2025-07-19 15:38:09 -07:00
Lifu Huang	3de617a75b	Fix LoRA buffer contamination during adapter eviction (#8103 )	2025-07-19 13:14:08 -07:00
Lifu Huang	e2ed9d049a	Refactor dynamic LoRA update to fix incorrect handling of variant weight shapes (#7844 )	2025-07-13 18:36:01 -07:00
Lifu Huang	ea4bf12286	Fix division-by-zero bug in LoRA triton kernels. (#7785 )	2025-07-06 00:45:29 -07:00
Lifu Huang	49538d111b	Support dynamic LoRA loading / unloading in engine/server API (#7446 )	2025-06-27 21:00:27 -07:00
Lifu Huang	1998ce4046	Refactor LoRAManager and LoRAMemoryPool state management logic for dynamic LoRA loading support (#7412 )	2025-06-21 16:09:19 -07:00
Lifu Huang	021f76e4f4	[Perf] Refactor LoRAManager to eliminate stream syncs and redundant computations (#6994 )	2025-06-11 16:18:57 -07:00
Lifu Huang	b1e5a33ae3	Eliminate stream sync to speed up LoRA batch init (#6960 )	2025-06-09 00:22:45 -07:00
Lifu Huang	4474eaf552	Support LoRA in TestOpenAIVisionServer and fix fused kv_proj loading bug. (#6861 )	2025-06-04 22:08:30 -07:00
Lifu Huang	094fbdacd5	Fix incorrect LoRA weight loading for fused gate_up_proj (#6734 )	2025-05-31 13:41:44 -07:00
Lifu Huang	477a101cbd	Refactor LoRA handling to support adapter tensors in fused format (#6585 )	2025-05-26 21:51:54 -07:00
Qiaolin Yu	cd8d4b9dfc	Fix lora bench (#6302 )	2025-05-15 10:09:55 -07:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
Qiaolin Yu	3042f1da61	Fix flaky issues of lora and add multi batch tests (#5957 )	2025-05-04 13:11:40 -07:00
Qiaolin Yu	7bcd8b1cb2	Fix lora batch processing when input lora_path contains None (#5930 )	2025-04-30 19:42:42 -07:00
Qiaolin Yu	8c0cfca87d	Feat: support cuda graph for LoRA (#4115 ) Co-authored-by: Beichen Ma <mabeichen12@gmail.com>	2025-04-28 23:30:44 -07:00
Lianmin Zheng	177320a582	Clean up imports (#5467 )	2025-04-16 15:26:49 -07:00
chaobo jia	ef9a378a20	[Feature] add multi-rank support for Lora (#4492 ) Co-authored-by: rudy152 <czh1137892874@gmail.com>	2025-03-28 09:38:44 -07:00
Qiaolin Yu	9fdc6d6abc	Fix the lora adapter when lora path is none (#4799 ) Co-authored-by: Beichen Ma <mabeichen12@gmail.com>	2025-03-27 21:03:08 -07:00
aoshen524	588865f0e0	[Feature] Support Tensor Parallelism and Weight Slicing for Lora (#4274 ) Co-authored-by: ShenAo1111 <1377693092@qq.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-03-18 20:33:07 -07:00
Lianmin Zheng	1361ab9e03	Lazily import lora backends (#4225 )	2025-03-08 23:39:26 -08:00
aoshen524	e79f7420be	[Fix] Fix bugs and refactor codes in lora for better scalability. (#3652 ) Co-authored-by: ShenAo1111 <1377693092@qq.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-20 11:51:57 -08:00
Baizhou Zhang	c45cab1c00	[Fix] Fix accuracy bug and refactor codes for lora (#3413 )	2025-02-10 13:29:00 +08:00
Baizhou Zhang	76fa2d152c	Fix lora flashinfer import bug on ROCM (#3312 )	2025-02-05 16:36:49 +08:00
Baizhou Zhang	70817a7eae	[Feature] Define backends and add Triton backend for Lora (#3161 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-03 22:09:13 -08:00
Yineng Zhang	959dca4fc7	use srt VocabParallelEmbedding (#3252 )	2025-02-01 22:23:09 +08:00

1 2

63 Commits