TianyiQ
|
3c93187caf
|
Add support for tie_word_embeddings when loading weights + support for SmolLM (#1508)
|
2024-09-24 21:50:20 -07:00 |
|
Yineng Zhang
|
b4408b0d16
|
feat: update linear deps 1/N (#1305)
|
2024-09-19 20:53:11 +08:00 |
|
Lianmin Zheng
|
1acccb364a
|
Fix oom issues with fp8 for llama (#1454)
|
2024-09-18 03:45:19 -07:00 |
|
Jerry Zhang
|
30b404ce72
|
Add torchao quant for mixtral and qwen_moe (#1418)
|
2024-09-14 06:46:55 +00:00 |
|
Liangsheng Yin
|
70b6802982
|
Optimize conflicts between CUDA graph and vocab mask tensors (#1392)
|
2024-09-13 20:27:53 -07:00 |
|
Ying Sheng
|
712216928f
|
[Feature] Initial support for multi-LoRA serving (#1307)
|
2024-09-12 16:46:14 -07:00 |
|
Jerry Zhang
|
a7c47e0f02
|
Add torchao quant (int4/int8/fp8) to llama models (#1341)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2024-09-09 05:32:41 -07:00 |
|
Lianmin Zheng
|
12cb115d38
|
Fix llama2 weight loader (#1317)
|
2024-09-03 05:32:14 -07:00 |
|
Jani Monoses
|
474317f2b6
|
Support Phi3 mini and medium (#1299)
|
2024-09-02 21:49:40 -07:00 |
|
Lianmin Zheng
|
f64eae3a29
|
[Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308)
|
2024-09-02 21:44:45 -07:00 |
|