Lianmin Zheng
|
86d10d220f
|
Update grok.py and tiktoken tokenizer (#9532)
|
2025-08-23 05:40:18 -07:00 |
|
blzheng
|
ebbb75e917
|
[CPU] Fix TP padding issue on Phi-4 (#8289)
|
2025-08-17 16:25:26 -07:00 |
|
PGFLMG
|
b7cd743038
|
[Feat] QWen-1M context support[2/2]: Update block sparse attention backend (#5949)
|
2025-08-06 23:49:36 -07:00 |
|
Wenchen Lo
|
ea93079b30
|
model: adapt mllama4 to VisionAttention (#8512)
Co-authored-by: root <mickjagger19@icloud.com>
|
2025-08-02 00:39:40 -07:00 |
|
Chang Su
|
51c38163c1
|
model: support Step3V (#8583)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: nnnobody-code <nnnobody@foxmail.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: Qiaolin-Yu <qy254@cornell.edu>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2025-07-31 02:41:00 -07:00 |
|
Lianmin Zheng
|
8d2cf38c79
|
[Minor] Remove redundant print (#8005)
|
2025-07-14 10:55:13 -07:00 |
|
Atream
|
615553079d
|
Support Kimi K2 (#7940)
|
2025-07-11 00:02:21 -07:00 |
|
Lianmin Zheng
|
14229ccf8f
|
Move mem_fraction_static adjustment for multimodal models to server_args.py & Fix session control & Other cleanups (#7748)
|
2025-07-04 16:33:33 -07:00 |
|
Xinyuan Tong
|
d6864ce6d6
|
[New Model] Devstral support (#6547)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-05-26 19:27:48 -07:00 |
|
Mick
|
01dd39bac1
|
refactor: minor refactors regarding multimodal processing (#6187)
|
2025-05-17 22:53:20 -07:00 |
|
Lianmin Zheng
|
e07a6977e7
|
Minor improvements of TokenizerManager / health check (#6327)
|
2025-05-15 15:29:25 -07:00 |
|
Lianmin Zheng
|
e8e18dcdcc
|
Revert "fix some typos" (#6244)
|
2025-05-12 12:53:26 -07:00 |
|
applesaucethebun
|
d738ab52f8
|
fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-13 01:42:38 +08:00 |
|
xm:D
|
3409aaab32
|
Support InternVL3 (#5350)
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-05-01 22:38:59 -07:00 |
|
liwenju0
|
8fefdd32c7
|
[Feature] add support kimi vl model (#5383)
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
|
2025-04-29 21:31:19 -07:00 |
|
Lianmin Zheng
|
5641a09458
|
Revert "[Model] Support ArcticForCausalLM architecture (Snowflake/snowflake-arctic-instruct)" (#5754)
|
2025-04-25 15:50:28 -07:00 |
|
Brayden Zhong
|
43fb95c2fa
|
[Model] Support ArcticForCausalLM architecture (Snowflake/snowflake-arctic-instruct) (#5078)
Co-authored-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com>
|
2025-04-25 15:24:09 +08:00 |
|
Mick
|
34ef6c8135
|
[VLM] Adopt fast image processor by default (#5065)
|
2025-04-11 21:46:58 -07:00 |
|
Adarsh Shirawalmath
|
f8f9244a61
|
[Bug Fix] Add partial rotary factor support for Phi-4 and upgrade to transformers v4.50.0 (#3984)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-03-22 14:27:39 -07:00 |
|
萝卜菜
|
d6d21640d3
|
[Feature] Support Deepseek-VL2 (#2798)
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
|
2025-03-16 23:07:59 -07:00 |
|
Mick
|
9d02bb3e2a
|
Urgent model support: support gemma-3-it (#4424)
|
2025-03-16 17:37:32 -07:00 |
|
wangyu
|
1ce4878d31
|
feat(remote_model): support variable remote backend for model loader (#3964)
Signed-off-by: wangyu <wangyu.steph@bytedance.com>
|
2025-03-14 00:40:44 -07:00 |
|
Mick
|
01090e8ac3
|
model: Support Janus-pro (#3203)
|
2025-03-12 11:02:11 -07:00 |
|
Mick
|
ff2ce0b86f
|
refactor: move image processors to separate files (#4229)
|
2025-03-11 12:35:35 -07:00 |
|
Mick
|
bcc213df61
|
Model: Support Qwen 2.5 vl (#3258)
|
2025-02-16 00:58:53 -08:00 |
|
Yunmeng
|
656aed58c6
|
Remove vllm dependency in model config (#2809)
|
2025-01-09 17:51:56 +08:00 |
|
Yineng Zhang
|
85e1a6f3aa
|
Update model_loader deps and qqq quantization deps (#2220) (#2318)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2024-12-02 23:22:13 +08:00 |
|
Lianmin Zheng
|
4936be8acc
|
Revert "Revert "[FEAT] Support GGUF format"" (#2287)
|
2024-11-30 22:14:48 -08:00 |
|
Lianmin Zheng
|
7e4c6dd8da
|
Revert "[FEAT] Support GGUF format" (#2285)
|
2024-11-30 19:03:26 -08:00 |
|
Yang Zheng
|
883c955489
|
[FEAT] Support GGUF format (#2215)
Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>
|
2024-11-30 00:44:48 -08:00 |
|
Xuehai Pan
|
62a4a339eb
|
docs: fix module docstrings and copyright headers (#2077)
|
2024-11-22 22:16:53 +08:00 |
|
Lianmin Zheng
|
2ce32db6fb
|
Let reward model take text inputs instead of message lists (#1907)
Co-authored-by: Kyle Corbitt <kyle@corbt.com>
|
2024-11-03 13:27:12 -08:00 |
|
Ran Chen
|
146f613405
|
Fix incorrect context length for llama3.2-11b (#1873)
|
2024-11-02 00:04:50 -07:00 |
|
Hui Liu
|
9ce8e1a93c
|
move max_position_embeddings to the last (#1799)
|
2024-10-25 19:30:50 -07:00 |
|
Lianmin Zheng
|
8f8f96a621
|
Fix the perf regression due to additional_stop_token_ids (#1773)
|
2024-10-23 16:45:21 -07:00 |
|
Lianmin Zheng
|
0d800090b4
|
Fix missing additional_stop_token_ids (#1769)
|
2024-10-23 12:18:59 -07:00 |
|
Lianmin Zheng
|
80a905475d
|
Fix stop condition for <|eom_id|> (#1766)
|
2024-10-23 10:47:12 -07:00 |
|
Yineng Zhang
|
cbbc82b7b8
|
Support qwen2 vl model (#1721)
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: ispobock <ISPObaoke@163.com>
|
2024-10-19 21:44:38 -07:00 |
|
Lianmin Zheng
|
fb2d0680e0
|
[Fix] Fix clean_up_tokenization_spaces in tokenizer (#1510)
|
2024-09-24 21:37:33 -07:00 |
|
Lianmin Zheng
|
3a6e8b6d78
|
[Minor] move triton attention kernels into a separate folder (#1379)
|
2024-09-10 15:15:08 -07:00 |
|
Jani Monoses
|
474317f2b6
|
Support Phi3 mini and medium (#1299)
|
2024-09-02 21:49:40 -07:00 |
|
Kai-Hsun Chen
|
0836055324
|
[Chore] Rename model_overide_args to model_override_args (#1284)
Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-09-01 03:14:56 -07:00 |
|
Lianmin Zheng
|
79ece2c51f
|
Report median instead of mean in bench_latency.py (#1269)
|
2024-08-30 06:05:01 -07:00 |
|
김종곤
|
b7f8341014
|
EXAONE 3.0 Model Support (#1258)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-30 08:08:28 +00:00 |
|
Lianmin Zheng
|
bf53bf5142
|
[Fix] Fix llava on multi images (#1247)
|
2024-08-28 06:33:05 -07:00 |
|
Lianmin Zheng
|
902278008a
|
[Minor] Improve the function organization in TokenizerManager & improve loggers (#1208)
|
2024-08-25 14:46:34 -07:00 |
|
Lianmin Zheng
|
bea2bb9eea
|
Improve multi-node stability (#1171)
|
2024-08-20 22:35:05 -07:00 |
|
Lianmin Zheng
|
a8ae640328
|
Improve docs and warnings (#1164)
|
2024-08-20 08:31:29 -07:00 |
|
Lianmin Zheng
|
3c1f5a9220
|
Fix duplicated imports in hf_transformers_utils.py (#1141)
|
2024-08-17 18:03:00 -07:00 |
|
Lianmin Zheng
|
57d0bd91ec
|
Improve benchmark (#1140)
|
2024-08-17 17:43:23 -07:00 |
|