sglang

Author	SHA1	Message	Date
TianyiQ	3c93187caf	Add support for tie_word_embeddings when loading weights + support for SmolLM (#1508 )	2024-09-24 21:50:20 -07:00
Yineng Zhang	b4408b0d16	feat: update linear deps 1/N (#1305 )	2024-09-19 20:53:11 +08:00
Lianmin Zheng	1acccb364a	Fix oom issues with fp8 for llama (#1454 )	2024-09-18 03:45:19 -07:00
Jerry Zhang	30b404ce72	Add torchao quant for mixtral and qwen_moe (#1418 )	2024-09-14 06:46:55 +00:00
Liangsheng Yin	70b6802982	Optimize conflicts between CUDA graph and vocab mask tensors (#1392 )	2024-09-13 20:27:53 -07:00
Ying Sheng	712216928f	[Feature] Initial support for multi-LoRA serving (#1307 )	2024-09-12 16:46:14 -07:00
Jerry Zhang	a7c47e0f02	Add torchao quant (int4/int8/fp8) to llama models (#1341 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-09-09 05:32:41 -07:00
Lianmin Zheng	12cb115d38	Fix llama2 weight loader (#1317 )	2024-09-03 05:32:14 -07:00
Jani Monoses	474317f2b6	Support Phi3 mini and medium (#1299 )	2024-09-02 21:49:40 -07:00
Lianmin Zheng	f64eae3a29	[Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308 )	2024-09-02 21:44:45 -07:00