Commit Graph

10 Commits

Author SHA1 Message Date
TianyiQ
3c93187caf Add support for tie_word_embeddings when loading weights + support for SmolLM (#1508) 2024-09-24 21:50:20 -07:00
Yineng Zhang
b4408b0d16 feat: update linear deps 1/N (#1305) 2024-09-19 20:53:11 +08:00
Lianmin Zheng
1acccb364a Fix oom issues with fp8 for llama (#1454) 2024-09-18 03:45:19 -07:00
Jerry Zhang
30b404ce72 Add torchao quant for mixtral and qwen_moe (#1418) 2024-09-14 06:46:55 +00:00
Liangsheng Yin
70b6802982 Optimize conflicts between CUDA graph and vocab mask tensors (#1392) 2024-09-13 20:27:53 -07:00
Ying Sheng
712216928f [Feature] Initial support for multi-LoRA serving (#1307) 2024-09-12 16:46:14 -07:00
Jerry Zhang
a7c47e0f02 Add torchao quant (int4/int8/fp8) to llama models (#1341)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-09-09 05:32:41 -07:00
Lianmin Zheng
12cb115d38 Fix llama2 weight loader (#1317) 2024-09-03 05:32:14 -07:00
Jani Monoses
474317f2b6 Support Phi3 mini and medium (#1299) 2024-09-02 21:49:40 -07:00
Lianmin Zheng
f64eae3a29 [Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308) 2024-09-02 21:44:45 -07:00