Yuhong Guo
|
e5afb88b1c
|
Support weight loading without mmap (#7469)
|
2025-06-23 15:13:59 -07:00 |
|
Baizhou Zhang
|
2a5f0100e0
|
Fix GGuf and add back test_gguf.py (#7067)
|
2025-06-10 21:07:20 -07:00 |
|
fzyzcjy
|
73187152a4
|
Reland tiny refactor DefaultModelLoader.Source (#6041)
|
2025-05-17 17:11:20 -07:00 |
|
fzyzcjy
|
6450c1228c
|
Tiny refactor weight loading logic (#5232)
|
2025-05-08 01:02:56 -07:00 |
|
Lianmin Zheng
|
693723d1f7
|
Revert "Tiny refactor DefaultModelLoader.Source" (#5825)
|
2025-04-28 01:18:57 -07:00 |
|
fzyzcjy
|
644ed409d1
|
Tiny refactor DefaultModelLoader.Source (#5482)
|
2025-04-28 00:35:51 -07:00 |
|
ryang
|
bc24205b32
|
Support BNB quantization for llama/mllama (#5038)
Co-authored-by: Yuhao Yang <yyh073@foxmail.com>
|
2025-04-15 18:00:31 -07:00 |
|
yhyang201
|
072df75354
|
Support for Qwen2.5-VL Model in bitsandbytes Format (#5003)
|
2025-04-14 02:03:40 -07:00 |
|
HandH1998
|
4065248214
|
Support Llama4 fp8 inference (#5194)
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-04-09 20:14:34 +08:00 |
|
inkcherry
|
7ed77d6b9e
|
fix dummy-load deepseekv2 (#4535)
|
2025-04-04 15:22:37 -07:00 |
|
Juwan Yoo
|
188105a21b
|
deps: lazy import optional dependencies gguf and torchvision (#4826)
|
2025-03-27 14:35:36 -07:00 |
|
huiwq1990
|
5cbd709ea1
|
Fix: modelscope env comment (#4474)
Signed-off-by: huiwq1990 <huiwq1990@163.com>
|
2025-03-16 18:11:33 -07:00 |
|
wangyu
|
1ce4878d31
|
feat(remote_model): support variable remote backend for model loader (#3964)
Signed-off-by: wangyu <wangyu.steph@bytedance.com>
|
2025-03-14 00:40:44 -07:00 |
|
Lianmin Zheng
|
45de89719c
|
Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367)
|
2025-03-12 23:45:52 -07:00 |
|
Meng, Hengyu
|
71046fcd71
|
[XPU][CPU] Enable the native path of DeepSeek (#4086)
Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>
|
2025-03-12 22:26:29 -07:00 |
|
Lianmin Zheng
|
aa957102a9
|
Simplify tests & Fix trtllm custom allreduce registration (#4252)
|
2025-03-10 01:24:22 -07:00 |
|
Mick
|
583d6af71b
|
example: add vlm to token in & out example (#3941)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-04 22:18:26 -08:00 |
|
Ke Wen
|
862bcff833
|
Support loading of larger models with on-the-fly quantization (#3061)
|
2025-01-22 21:33:17 -08:00 |
|
Yineng Zhang
|
5dc54f1a62
|
feat: remove vllm distributed (#2907)
Co-authored-by: Zhangyi <1109276519@qq.com>
|
2025-01-17 22:31:51 +08:00 |
|
Sangchun Ha (Patrick)
|
08effbff35
|
Error occurs when loading the gemma model in bitsandbytes format. (#2557)
|
2024-12-26 05:10:37 -08:00 |
|
Yineng Zhang
|
85e1a6f3aa
|
Update model_loader deps and qqq quantization deps (#2220) (#2318)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2024-12-02 23:22:13 +08:00 |
|