xianzhiT
|
9f1787fa60
|
Support multi-thread model weight loading (#7277)
|
2025-06-24 10:39:10 -07:00 |
|
Yuhong Guo
|
e5afb88b1c
|
Support weight loading without mmap (#7469)
|
2025-06-23 15:13:59 -07:00 |
|
Charles Chen
|
8c16da334e
|
Fix Deepseek R1 0528 FP4 tensor name mismatch issue during weights loading. (#7164)
|
2025-06-17 11:26:23 -07:00 |
|
HandH1998
|
4065248214
|
Support Llama4 fp8 inference (#5194)
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-04-09 20:14:34 +08:00 |
|
DangKai
|
cc88d98ab8
|
fix empty_cache error in pt_weights_iterator (#5151)
Co-authored-by: dangkai.dk <dangkai.dk@alibaba-inc.com>
|
2025-04-08 01:22:10 -07:00 |
|
Brayden Zhong
|
e84f4ba0ab
|
[Misc] Fix issues reported by torchfix (#4837)
|
2025-03-27 20:10:32 -07:00 |
|
Juwan Yoo
|
188105a21b
|
deps: lazy import optional dependencies gguf and torchvision (#4826)
|
2025-03-27 14:35:36 -07:00 |
|
wangyu
|
1ce4878d31
|
feat(remote_model): support variable remote backend for model loader (#3964)
Signed-off-by: wangyu <wangyu.steph@bytedance.com>
|
2025-03-14 00:40:44 -07:00 |
|
Mick
|
ff2ce0b86f
|
refactor: move image processors to separate files (#4229)
|
2025-03-11 12:35:35 -07:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
Zhiyu
|
c66b2c9cf1
|
Add support for nvidia modelopt fp8 kv cache (#3223)
|
2025-02-22 07:04:58 +08:00 |
|
Lianmin Zheng
|
53cef81587
|
Improve weight loading and code style (#3174)
|
2025-01-27 03:00:41 -08:00 |
|
lukec
|
b8ab989ff4
|
Fix the FP8 E4M3 parsing offline scales failure bug (#3045)
|
2025-01-22 14:19:33 -08:00 |
|
bjmsong
|
d3024f4fc8
|
support e4m3 kvcache in qwen2 & add kv scaling facotr json (#2894)
Co-authored-by: bjmsong <bjmsong@126.com>
|
2025-01-18 11:43:22 +08:00 |
|
Yineng Zhang
|
5dc54f1a62
|
feat: remove vllm distributed (#2907)
Co-authored-by: Zhangyi <1109276519@qq.com>
|
2025-01-17 22:31:51 +08:00 |
|
Yineng Zhang
|
85e1a6f3aa
|
Update model_loader deps and qqq quantization deps (#2220) (#2318)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2024-12-02 23:22:13 +08:00 |
|