Commit Graph

66 Commits

Author SHA1 Message Date
Lianmin Zheng
86d10d220f Update grok.py and tiktoken tokenizer (#9532) 2025-08-23 05:40:18 -07:00
blzheng
ebbb75e917 [CPU] Fix TP padding issue on Phi-4 (#8289) 2025-08-17 16:25:26 -07:00
PGFLMG
b7cd743038 [Feat] QWen-1M context support[2/2]: Update block sparse attention backend (#5949) 2025-08-06 23:49:36 -07:00
Wenchen Lo
ea93079b30 model: adapt mllama4 to VisionAttention (#8512)
Co-authored-by: root <mickjagger19@icloud.com>
2025-08-02 00:39:40 -07:00
Chang Su
51c38163c1 model: support Step3V (#8583)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: nnnobody-code <nnnobody@foxmail.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: Qiaolin-Yu <qy254@cornell.edu>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
2025-07-31 02:41:00 -07:00
Lianmin Zheng
8d2cf38c79 [Minor] Remove redundant print (#8005) 2025-07-14 10:55:13 -07:00
Atream
615553079d Support Kimi K2 (#7940) 2025-07-11 00:02:21 -07:00
Lianmin Zheng
14229ccf8f Move mem_fraction_static adjustment for multimodal models to server_args.py & Fix session control & Other cleanups (#7748) 2025-07-04 16:33:33 -07:00
Xinyuan Tong
d6864ce6d6 [New Model] Devstral support (#6547)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-26 19:27:48 -07:00
Mick
01dd39bac1 refactor: minor refactors regarding multimodal processing (#6187) 2025-05-17 22:53:20 -07:00
Lianmin Zheng
e07a6977e7 Minor improvements of TokenizerManager / health check (#6327) 2025-05-15 15:29:25 -07:00
Lianmin Zheng
e8e18dcdcc Revert "fix some typos" (#6244) 2025-05-12 12:53:26 -07:00
applesaucethebun
d738ab52f8 fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-13 01:42:38 +08:00
xm:D
3409aaab32 Support InternVL3 (#5350)
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-05-01 22:38:59 -07:00
liwenju0
8fefdd32c7 [Feature] add support kimi vl model (#5383)
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
2025-04-29 21:31:19 -07:00
Lianmin Zheng
5641a09458 Revert "[Model] Support ArcticForCausalLM architecture (Snowflake/snowflake-arctic-instruct)" (#5754) 2025-04-25 15:50:28 -07:00
Brayden Zhong
43fb95c2fa [Model] Support ArcticForCausalLM architecture (Snowflake/snowflake-arctic-instruct) (#5078)
Co-authored-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com>
2025-04-25 15:24:09 +08:00
Mick
34ef6c8135 [VLM] Adopt fast image processor by default (#5065) 2025-04-11 21:46:58 -07:00
Adarsh Shirawalmath
f8f9244a61 [Bug Fix] Add partial rotary factor support for Phi-4 and upgrade to transformers v4.50.0 (#3984)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-03-22 14:27:39 -07:00
萝卜菜
d6d21640d3 [Feature] Support Deepseek-VL2 (#2798)
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
2025-03-16 23:07:59 -07:00
Mick
9d02bb3e2a Urgent model support: support gemma-3-it (#4424) 2025-03-16 17:37:32 -07:00
wangyu
1ce4878d31 feat(remote_model): support variable remote backend for model loader (#3964)
Signed-off-by: wangyu <wangyu.steph@bytedance.com>
2025-03-14 00:40:44 -07:00
Mick
01090e8ac3 model: Support Janus-pro (#3203) 2025-03-12 11:02:11 -07:00
Mick
ff2ce0b86f refactor: move image processors to separate files (#4229) 2025-03-11 12:35:35 -07:00
Mick
bcc213df61 Model: Support Qwen 2.5 vl (#3258) 2025-02-16 00:58:53 -08:00
Yunmeng
656aed58c6 Remove vllm dependency in model config (#2809) 2025-01-09 17:51:56 +08:00
Yineng Zhang
85e1a6f3aa Update model_loader deps and qqq quantization deps (#2220) (#2318)
Co-authored-by: HandH1998 <1335248067@qq.com>
2024-12-02 23:22:13 +08:00
Lianmin Zheng
4936be8acc Revert "Revert "[FEAT] Support GGUF format"" (#2287) 2024-11-30 22:14:48 -08:00
Lianmin Zheng
7e4c6dd8da Revert "[FEAT] Support GGUF format" (#2285) 2024-11-30 19:03:26 -08:00
Yang Zheng
883c955489 [FEAT] Support GGUF format (#2215)
Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>
2024-11-30 00:44:48 -08:00
Xuehai Pan
62a4a339eb docs: fix module docstrings and copyright headers (#2077) 2024-11-22 22:16:53 +08:00
Lianmin Zheng
2ce32db6fb Let reward model take text inputs instead of message lists (#1907)
Co-authored-by: Kyle Corbitt <kyle@corbt.com>
2024-11-03 13:27:12 -08:00
Ran Chen
146f613405 Fix incorrect context length for llama3.2-11b (#1873) 2024-11-02 00:04:50 -07:00
Hui Liu
9ce8e1a93c move max_position_embeddings to the last (#1799) 2024-10-25 19:30:50 -07:00
Lianmin Zheng
8f8f96a621 Fix the perf regression due to additional_stop_token_ids (#1773) 2024-10-23 16:45:21 -07:00
Lianmin Zheng
0d800090b4 Fix missing additional_stop_token_ids (#1769) 2024-10-23 12:18:59 -07:00
Lianmin Zheng
80a905475d Fix stop condition for <|eom_id|> (#1766) 2024-10-23 10:47:12 -07:00
Yineng Zhang
cbbc82b7b8 Support qwen2 vl model (#1721)
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: ispobock <ISPObaoke@163.com>
2024-10-19 21:44:38 -07:00
Lianmin Zheng
fb2d0680e0 [Fix] Fix clean_up_tokenization_spaces in tokenizer (#1510) 2024-09-24 21:37:33 -07:00
Lianmin Zheng
3a6e8b6d78 [Minor] move triton attention kernels into a separate folder (#1379) 2024-09-10 15:15:08 -07:00
Jani Monoses
474317f2b6 Support Phi3 mini and medium (#1299) 2024-09-02 21:49:40 -07:00
Kai-Hsun Chen
0836055324 [Chore] Rename model_overide_args to model_override_args (#1284)
Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-09-01 03:14:56 -07:00
Lianmin Zheng
79ece2c51f Report median instead of mean in bench_latency.py (#1269) 2024-08-30 06:05:01 -07:00
김종곤
b7f8341014 EXAONE 3.0 Model Support (#1258)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-08-30 08:08:28 +00:00
Lianmin Zheng
bf53bf5142 [Fix] Fix llava on multi images (#1247) 2024-08-28 06:33:05 -07:00
Lianmin Zheng
902278008a [Minor] Improve the function organization in TokenizerManager & improve loggers (#1208) 2024-08-25 14:46:34 -07:00
Lianmin Zheng
bea2bb9eea Improve multi-node stability (#1171) 2024-08-20 22:35:05 -07:00
Lianmin Zheng
a8ae640328 Improve docs and warnings (#1164) 2024-08-20 08:31:29 -07:00
Lianmin Zheng
3c1f5a9220 Fix duplicated imports in hf_transformers_utils.py (#1141) 2024-08-17 18:03:00 -07:00
Lianmin Zheng
57d0bd91ec Improve benchmark (#1140) 2024-08-17 17:43:23 -07:00