Commit Graph

9 Commits

Author SHA1 Message Date
Yun Dai
2695ab0537 Fix loading KV quantization scale; Enable modelopt kv cache (#4686)
Co-authored-by: qingquansong <ustcsqq@gmail.com>
2025-04-08 09:11:35 -07:00
Qubitium-ModelCloud
56a724eba3 [QUANT] Add GPTQModel Dynamic Quantization + lm_head Quantization (#3790)
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
2025-03-05 01:11:00 -08:00
fzyzcjy
d37f95511d Improve: Tiny fix Olmo2 (#3348) 2025-02-21 16:09:35 -08:00
Yineng Zhang
5a176c92df fix deepseek v2 with cpu device (#2975) 2025-01-19 21:33:27 +08:00
Yineng Zhang
2add697d7a feat: remove vllm get_rope (#2964) 2025-01-18 19:38:01 +08:00
Yineng Zhang
033c715b46 cleanup models dependencies 1/n (#2948) 2025-01-17 23:46:48 +08:00
Yineng Zhang
5dc54f1a62 feat: remove vllm distributed (#2907)
Co-authored-by: Zhangyi <1109276519@qq.com>
2025-01-17 22:31:51 +08:00
Yineng Zhang
85e1a6f3aa Update model_loader deps and qqq quantization deps (#2220) (#2318)
Co-authored-by: HandH1998 <1335248067@qq.com>
2024-12-02 23:22:13 +08:00
Jani Monoses
db674e3d24 Add OLMo2 model. (#2233) 2024-11-28 00:15:20 -08:00