Neelabh Sinha
|
852c0578fd
|
[FEATURE] Add OpenAI-Compatible LoRA Adapter Selection (#11570)
|
2025-10-21 15:44:33 +08:00 |
|
Yineng Zhang
|
da681f35d3
|
Revert "Set csgmv as default lora backend. (#11488)" (#11735)
|
2025-10-17 12:01:36 -05:00 |
|
Lifu Huang
|
b0d20cdec7
|
Set csgmv as default lora backend. (#11488)
|
2025-10-15 23:53:24 -05:00 |
|
Chenxi Li
|
28f80b1244
|
Implement LRU eviction policy for LoRA adapters (#11041)
|
2025-10-13 20:18:25 -07:00 |
|
Glen Liu
|
9a7e7a6576
|
[Bug Fix] prevent lora adapter from being loaded into LoRAManager if it is already loaded (#11365)
|
2025-10-09 18:43:03 -07:00 |
|
Lianmin Zheng
|
a17e70f5cc
|
Use more general heuristics to set the default value of --mem-fraction-static (#10975)
Co-authored-by: sglang-bot <sglangbot@gmail.com>
|
2025-09-29 10:11:03 -07:00 |
|
Lifu Huang
|
2101d93b4f
|
Fix CI TestChunkedSGMV (#10737)
|
2025-09-22 16:09:58 +08:00 |
|
Lifu Huang
|
08ecd0aa2a
|
[3/4] Speed up CSGMV backend perf by 10% through dynamic chunking + kernel optimization (#10592)
|
2025-09-20 22:47:48 -07:00 |
|
Lifu Huang
|
3f41b48c40
|
[2/2] Introduce Chunked-SGMV kernels and corresponding LoRA backend for improved performance (#10286)
|
2025-09-15 16:04:03 -07:00 |
|
Lifu Huang
|
e903f695c8
|
Fix potential flakiness in test_lora_qwen3 (#10250)
|
2025-09-10 08:04:39 +00:00 |
|
gongwei-130
|
ab62b135c1
|
support Llama4 with non uniformed intermediate size across layers for… (#10047)
|
2025-09-05 17:28:15 -07:00 |
|
Baizhou Zhang
|
7de2ce45b2
|
Disable radix cache in test_lora_update.py for better stability (#9852)
|
2025-08-31 22:28:22 -07:00 |
|
Lifu Huang
|
b0980af89f
|
Support pinning adapter via server args. (#9249)
|
2025-08-20 16:25:01 -07:00 |
|
Baizhou Zhang
|
75e6a7cde1
|
Support radix cache for Lora feature (#7216)
|
2025-08-11 10:14:11 -07:00 |
|
Lifu Huang
|
e322a94d1f
|
Reduce CI duration of test_lora_update. (#9024)
|
2025-08-10 15:34:04 -07:00 |
|
Lianmin Zheng
|
2c7f01bc89
|
Reorganize CI and test files (#9027)
|
2025-08-10 12:30:06 -07:00 |
|