Commit Graph

81 Commits

Author SHA1 Message Date
Lifu Huang
fb16fbaf52 Fix incorrect KV cache allocation for MTP models. (#8482)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
2025-07-28 22:54:50 -07:00
Yuxuan Zhang
6d6a8bc278 GLM-4.5 Model Support (#8224)
Co-authored-by: Lifu Huang <lifu.hlf@gmail.com>
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
2025-07-27 22:54:07 -07:00
RunningLeon
b7094a5ef1 model: support intern-s1 (#8350)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: zxy <zhou0493@e.ntu.edu.sg>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
2025-07-26 13:48:51 -07:00
Minho Ryu
bfb118c01e fix bug when eos_ids==0 (#8315) 2025-07-23 23:18:47 -07:00
Lianmin Zheng
bb0e8a32b5 Clean up server args (#8161) 2025-07-19 11:32:52 -07:00
Haohui Mai
d918ab7985 Support NVFP4 quantized dense models on AMD CDNA2/CDNA3 GPUs (#7302)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>
2025-07-18 19:59:39 -07:00
Hanming Lu
9379da77de SWA Prefix Cache (#7367)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2025-07-13 12:31:07 -07:00
Atream
615553079d Support Kimi K2 (#7940) 2025-07-11 00:02:21 -07:00
ronnie_zheng
766392c6bd [feature]Ascend quantization support (#7791)
Co-authored-by: ichernob <ichernobnn@gmail.com>
Co-authored-by: liupeng <liupeng374@huawei.com>
2025-07-10 09:17:37 -07:00
SijiaYang
cb9d91ea8a feat: support DeepSeek-R1-W4AFP8 model with ep-moe mode (#7762)
Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com>
2025-07-07 14:47:21 -07:00
Leng Yue
8364608930 add model: qwen2-audio (#7596) 2025-07-04 21:13:10 -07:00
tarinkk
eb6c2c1663 Hybrid kv cache for LLaMA4 (#6563)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: tarinkk <rt572@physics.rutger.edu>
Co-authored-by: tarinkk <rt572@rutgers.physics.edu>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
2025-06-27 18:58:55 -07:00
Xinyuan Tong
9b00990bea chore: remove vlm unnecessary import (#7541)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2025-06-26 01:38:15 -07:00
woodx
e30ef368ab Feat/support rerank (#6058) 2025-06-16 10:50:01 -07:00
Zijian
31d6dee5c4 Support VILA models (#6106) 2025-06-11 11:47:25 -07:00
Marc Sun
37f1547587 [FEAT] Add transformers backend support (#5929) 2025-06-03 21:05:29 -07:00
Mick
ce9d690ef4 fix: fix nightly test from updating transformers (#6658) 2025-05-27 00:28:11 -07:00
Yineng Zhang
7eb9d8e594 chore: upgrade transformers 4.52.3 (#6575)
Co-authored-by: Mick <mickjagger19@icloud.com>
2025-05-25 22:49:58 -07:00
Lifu Huang
022012aae8 Support Phi-4 Multi-Modal (text + vision only) (#6494) 2025-05-24 21:43:38 -07:00
HandH1998
1b2e8f76d9 [2/2] Support Qserve (#6521) 2025-05-23 12:39:18 -07:00
Chang Su
4685fbb888 [VLM] Support chunk prefill for VLM (#6355)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-05-22 20:32:41 -07:00
ryang
a6ae3af15e Support XiaomiMiMo inference with mtp (#6059) 2025-05-22 14:14:49 -07:00
Mick
01dd39bac1 refactor: minor refactors regarding multimodal processing (#6187) 2025-05-17 22:53:20 -07:00
Kiv Chen
64825b8395 model(vlm): mistral 3.1 (#5099)
Co-authored-by: KivenChen <sleigh-queue-0y@icloud.com>
2025-05-16 18:36:18 -07:00
Kiv Chen
5380cd7ea3 model(vlm): pixtral (#5084) 2025-05-13 00:16:10 -07:00
fzyzcjy
b6cf3532b5 Tiny refactor ModelConfig.from_server_args (#5219) 2025-05-08 01:02:43 -07:00
xm:D
3409aaab32 Support InternVL3 (#5350)
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-05-01 22:38:59 -07:00
liwenju0
8fefdd32c7 [Feature] add support kimi vl model (#5383)
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
2025-04-29 21:31:19 -07:00
Ke Bao
dd408ee481 Auto set draft model path for MTP (#5793) 2025-04-29 16:25:40 -07:00
ZXN
04d0123fd9 [Fix]: support deepseek-vl2-tiny model (#5552)
Co-authored-by: bppps <zouyu.zzx@alibaba-inc.com>
2025-04-26 17:52:53 +08:00
Mick
c998d04b46 vlm: enable radix cache for qwen-vl models (#5349)
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
2025-04-23 20:35:05 -07:00
Yineng Zhang
04f2abcb34 fix: gemma 3 not use softcap (#5622) 2025-04-22 01:16:08 -07:00
tarinkk
9a7e83e899 Fix enable chunked prefill for Llama4 (#5575) 2025-04-20 17:01:30 -07:00
Cheng Wan
038bc5d521 Support --enable-llama4-multimodal (#5254) 2025-04-11 01:24:14 -07:00
Mick
fbebcb7aa4 model: support mllama4 (#5144) 2025-04-09 09:28:44 -07:00
Trevor Morris
11d760d56a FP4 weight loading and inference (2/2) (#3972) 2025-04-08 17:26:21 -07:00
Yun Dai
2695ab0537 Fix loading KV quantization scale; Enable modelopt kv cache (#4686)
Co-authored-by: qingquansong <ustcsqq@gmail.com>
2025-04-08 09:11:35 -07:00
Yun Dai
9731eca77b [modelopt] automatically inspect if model is ModelOpt quantized and set quantization method (#5145) 2025-04-07 22:12:11 -07:00
Chang Su
f04c80dc42 Add Llama4 support (#5092)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: ispobock <ispobaoke@163.com>
2025-04-07 00:29:36 -07:00
AniZpZ
d95269f9b3 [2/3] fix dsv3 awq issue (#4625)
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
2025-04-03 17:36:39 -07:00
Lianmin Zheng
74e0ac1dbd Clean up import vllm in quantization/__init__.py (#4834) 2025-03-28 10:34:10 -07:00
Pan Lyu
c913ed4046 support clip embedding model (#4506) 2025-03-27 00:18:15 -07:00
Xiaoyu Zhang
04e3ff6975 Support compressed tensors fp8w8a8 (#4743) 2025-03-26 13:21:25 -07:00
Mick
1e86457c90 model: Minicpmo (#3023) 2025-03-24 20:08:40 -07:00
Ximingwang-09
22c3702e1e [Model] Support Qwen2ForSequenceClassification (#4609)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
2025-03-24 19:13:44 -07:00
Mick
11577cedb7 refactor: bug fixes and refactor for vlm (#4661) 2025-03-22 22:48:49 -07:00
萝卜菜
d6d21640d3 [Feature] Support Deepseek-VL2 (#2798)
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
2025-03-16 23:07:59 -07:00
Mick
9d02bb3e2a Urgent model support: support gemma-3-it (#4424) 2025-03-16 17:37:32 -07:00
wangyu
1ce4878d31 feat(remote_model): support variable remote backend for model loader (#3964)
Signed-off-by: wangyu <wangyu.steph@bytedance.com>
2025-03-14 00:40:44 -07:00
Mick
01090e8ac3 model: Support Janus-pro (#3203) 2025-03-12 11:02:11 -07:00