Commit Graph

21 Commits

Author SHA1 Message Date
Yuhong Guo
e5afb88b1c Support weight loading without mmap (#7469) 2025-06-23 15:13:59 -07:00
Baizhou Zhang
2a5f0100e0 Fix GGuf and add back test_gguf.py (#7067) 2025-06-10 21:07:20 -07:00
fzyzcjy
73187152a4 Reland tiny refactor DefaultModelLoader.Source (#6041) 2025-05-17 17:11:20 -07:00
fzyzcjy
6450c1228c Tiny refactor weight loading logic (#5232) 2025-05-08 01:02:56 -07:00
Lianmin Zheng
693723d1f7 Revert "Tiny refactor DefaultModelLoader.Source" (#5825) 2025-04-28 01:18:57 -07:00
fzyzcjy
644ed409d1 Tiny refactor DefaultModelLoader.Source (#5482) 2025-04-28 00:35:51 -07:00
ryang
bc24205b32 Support BNB quantization for llama/mllama (#5038)
Co-authored-by: Yuhao Yang <yyh073@foxmail.com>
2025-04-15 18:00:31 -07:00
yhyang201
072df75354 Support for Qwen2.5-VL Model in bitsandbytes Format (#5003) 2025-04-14 02:03:40 -07:00
HandH1998
4065248214 Support Llama4 fp8 inference (#5194)
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: zhyncs <me@zhyncs.com>
2025-04-09 20:14:34 +08:00
inkcherry
7ed77d6b9e fix dummy-load deepseekv2 (#4535) 2025-04-04 15:22:37 -07:00
Juwan Yoo
188105a21b deps: lazy import optional dependencies gguf and torchvision (#4826) 2025-03-27 14:35:36 -07:00
huiwq1990
5cbd709ea1 Fix: modelscope env comment (#4474)
Signed-off-by: huiwq1990 <huiwq1990@163.com>
2025-03-16 18:11:33 -07:00
wangyu
1ce4878d31 feat(remote_model): support variable remote backend for model loader (#3964)
Signed-off-by: wangyu <wangyu.steph@bytedance.com>
2025-03-14 00:40:44 -07:00
Lianmin Zheng
45de89719c Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367) 2025-03-12 23:45:52 -07:00
Meng, Hengyu
71046fcd71 [XPU][CPU] Enable the native path of DeepSeek (#4086)
Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>
2025-03-12 22:26:29 -07:00
Lianmin Zheng
aa957102a9 Simplify tests & Fix trtllm custom allreduce registration (#4252) 2025-03-10 01:24:22 -07:00
Mick
583d6af71b example: add vlm to token in & out example (#3941)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-04 22:18:26 -08:00
Ke Wen
862bcff833 Support loading of larger models with on-the-fly quantization (#3061) 2025-01-22 21:33:17 -08:00
Yineng Zhang
5dc54f1a62 feat: remove vllm distributed (#2907)
Co-authored-by: Zhangyi <1109276519@qq.com>
2025-01-17 22:31:51 +08:00
Sangchun Ha (Patrick)
08effbff35 Error occurs when loading the gemma model in bitsandbytes format. (#2557) 2024-12-26 05:10:37 -08:00
Yineng Zhang
85e1a6f3aa Update model_loader deps and qqq quantization deps (#2220) (#2318)
Co-authored-by: HandH1998 <1335248067@qq.com>
2024-12-02 23:22:13 +08:00