Netanel Haber
|
d6837aea4d
|
model: Support Hybrid Mamba2 NemotronHForCausalLM (nvidia/NVIDIA-Nemotron-Nano-9B-v2) (#10909)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
|
2025-10-09 00:37:38 +08:00 |
|
Chang Su
|
7ba3de0e92
|
[oai serving chat] Add argument --sampling-defaults and fix ChatCompletionRequest defaults (#11304)
|
2025-10-08 00:36:05 +00:00 |
|
Zhiyu
|
155cbb51f0
|
Enable native ModelOpt quantization support (1/3) (#7149)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
|
2025-10-06 13:24:15 -07:00 |
|
fzyzcjy
|
efbc687c28
|
Support DeepSeek V3.2 Exp (#11061)
Co-authored-by: Stefan He <11166516+hebiao064@users.noreply.github.com>
Co-authored-by: Liangsheng Yin <95566987+hnyls2002@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <56809903+fridge003@users.noreply.github.com>
Co-authored-by: DarkSharpness <76582120+darksharpness@users.noreply.github.com>
Co-authored-by: ZhengdQin <46387172+zhengdqin@users.noreply.github.com>
Co-authored-by: DarkSharpness <2040703891@qq.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Zhengda Qin <zhengdqin@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-10-06 00:24:15 -07:00 |
|
fzyzcjy
|
fdc4e1e570
|
Tiny move files to utils folder (#11166)
|
2025-10-03 22:40:06 +08:00 |
|
Liangsheng Yin
|
458611de77
|
Unify forward output datastructure (#11124)
|
2025-10-03 00:28:57 +08:00 |
|
ilyasch2
|
083629c235
|
[model] Add mamba2 and Falcon-H1 support. (#10988)
Co-authored-by: Younes Belkada <younes.belkada@tii.ae>
Co-authored-by: Younes B <49240599+younesbelkada@users.noreply.github.com>
|
2025-10-02 19:15:36 +08:00 |
|
qrskannbara
|
fb367acfcb
|
Support Dots.ocr model (#11071)
|
2025-09-30 12:18:39 -07:00 |
|
amysaq2023
|
2bdaf482f9
|
refactor loading weights from remote instance coding format (#10941)
Signed-off-by: Anqi Shen <amy.saq@antgroup.com>
|
2025-09-26 15:25:39 -07:00 |
|
Lianmin Zheng
|
3e43eb137b
|
[Auto Sync] Update model_config.py (20250925) (#10885)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2025-09-24 22:59:16 -07:00 |
|
Lianmin Zheng
|
f47a2c67e6
|
[Auto Sync] Update load_config.py, model_config.py, configu... (20250923) (#10825)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
|
2025-09-23 16:48:12 -07:00 |
|
Xinyuan Tong
|
aab35bccb4
|
fix: draft model IMA by overide max_positional_embeddings (#10787)
Co-authored-by: Qiaolin Yu <qy254@cornell.edu>
|
2025-09-23 12:56:16 -07:00 |
|
Zheng Li
|
4f564b9e83
|
model: support qwen3-vl series (#10323)
Co-authored-by: ocss884 <ocss.lin@gmail.com>
Co-authored-by: cao1zhg <653506626@qq.com>
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: yhyang201 <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: 瑀澈 <yuche.lz@alibaba-inc.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-09-23 10:15:52 -07:00 |
|
Chang Su
|
c1815a99b7
|
model support: Sarashina2VisionForCausalLM (#10632)
|
2025-09-18 17:30:38 -07:00 |
|
Binyao Jiang
|
9752861002
|
[Fix] Support qwen3-next MTP+DP (#10392)
|
2025-09-13 17:45:04 +08:00 |
|
amysaq2023
|
30d20ce84f
|
Support loading weights from remote instance (#8215)
Signed-off-by: Anqi Shen <amy.saq@antgroup.com>
Co-authored-by: Chayenne <74843776+zhaochenyang20@users.noreply.github.com>
|
2025-09-12 17:40:22 +08:00 |
|
chenge@xiaohongshu.com
|
1b1701f1f7
|
model: support dots.vlm1 model (#8778)
Co-authored-by: weishi <bushou@xiaohongshu.com>
Co-authored-by: Ezra-Yu <1105212286@qq.com>
Co-authored-by: Jianfei Wang <905787410@qq.com>
Co-authored-by: qianwu <wangjianfei@xiaohongshu.com>
|
2025-09-12 17:38:38 +08:00 |
|
strgrb
|
fac07c9b08
|
Support LingV2 model (#10359)
Co-authored-by: 羽癫 <yudian.zy@antgroup.com>
Co-authored-by: guoyuhong <yuhong.gyh@antgroup.com>
|
2025-09-11 23:53:52 -07:00 |
|
gongwei-130
|
a2424068ec
|
add try catch for quant config hf download (#10340)
|
2025-09-11 15:00:21 -07:00 |
|
Yi Zhang
|
30c6e1f569
|
Qwen3-Next support (#10233)
Co-authored-by: cao1zhg <114661107+cao1zhg@users.noreply.github.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
Co-authored-by: Lifu Huang <lifu.hlf@gmail.com>
Co-authored-by: qingquansong <ustcsqq@gmail.com>
Co-authored-by: Yaoyao Ding <dingyaoyao.cs@gmail.com>
Co-authored-by: Ke Bao <ISPObaoke@163.com>
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
|
2025-09-11 04:11:49 -07:00 |
|
Yineng Zhang
|
b7d1f17b8d
|
Revert "enable auto-round quantization model (#6226)" (#10148)
|
2025-09-07 22:31:11 -07:00 |
|
Weiwei
|
c8295d2353
|
enable auto-round quantization model (#6226)
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
|
2025-09-07 22:05:35 -07:00 |
|
DevashishLal-CB
|
13705dae06
|
[Fix] Add speculative_draft_model_revision to server_args (#5255)
Signed-off-by: Devashish Lal <devashish@rivosinc.com>
|
2025-09-05 19:45:46 +08:00 |
|
yilian49
|
f64b8e3e4e
|
Support the internvl3.5 family models in sglang (#9705)
|
2025-09-02 22:06:48 +08:00 |
|
chenxj
|
d4a938417d
|
[feat] Support tp mode for DeepSeek-R1-W4AFP8 (#8118)
Co-authored-by: yuhyao <827623970@qq.com>
|
2025-09-01 22:17:26 -07:00 |
|
Guoyuan Lin
|
5e194b2143
|
[Model] Support Meituan LongCat-Flash && LongCat-Flash-MTP (#9824)
|
2025-08-30 23:29:21 -07:00 |
|
Liangsheng Yin
|
eb19ccadae
|
[bug] fix errors related to context length in SD (#9388)
|
2025-08-21 10:32:34 +08:00 |
|
blzheng
|
ebbb75e917
|
[CPU] Fix TP padding issue on Phi-4 (#8289)
|
2025-08-17 16:25:26 -07:00 |
|
Netanel Haber
|
845d12a979
|
model: support nvidia/Llama-3_3-Nemotron-Super-49B-v1 (#9067)
Co-authored-by: Kyle Huang <kylhuang@nvidia.com>
|
2025-08-17 01:48:15 -07:00 |
|
Zhihao Liu
|
65736dc524
|
[Model] Support Qwen3ForSequenceClassification for Qwen3-Embed Model (#7957)
|
2025-08-13 11:14:54 -07:00 |
|
Binyao Jiang
|
f29aba8c6e
|
Support glm4.1v and glm4.5v (#8798)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Chang Su <csu272@usc.edu>
|
2025-08-09 00:59:13 -07:00 |
|
Lianmin Zheng
|
706bd69cc5
|
Clean up server_args.py to have a dedicated function for model specific adjustments (#8983)
|
2025-08-08 19:56:50 -07:00 |
|
Wenbo Yang
|
1132547496
|
Add ernie4.py for ERNIE-4.5 (#7657)
|
2025-08-08 00:55:48 -07:00 |
|
PGFLMG
|
b7cd743038
|
[Feat] QWen-1M context support[2/2]: Update block sparse attention backend (#5949)
|
2025-08-06 23:49:36 -07:00 |
|
kk
|
d4bf5a8524
|
Support OCP MXFP4 quantization on AMD GPUs (#8255)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
|
2025-08-04 18:14:52 -07:00 |
|
Ke Bao
|
8fbcfd0723
|
Update step3v default config (#8626)
|
2025-08-01 00:49:26 +08:00 |
|
Chang Su
|
51c38163c1
|
model: support Step3V (#8583)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: nnnobody-code <nnnobody@foxmail.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: Qiaolin-Yu <qy254@cornell.edu>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2025-07-31 02:41:00 -07:00 |
|
Lifu Huang
|
fb16fbaf52
|
Fix incorrect KV cache allocation for MTP models. (#8482)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
|
2025-07-28 22:54:50 -07:00 |
|
Yuxuan Zhang
|
6d6a8bc278
|
GLM-4.5 Model Support (#8224)
Co-authored-by: Lifu Huang <lifu.hlf@gmail.com>
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
|
2025-07-27 22:54:07 -07:00 |
|
RunningLeon
|
b7094a5ef1
|
model: support intern-s1 (#8350)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: zxy <zhou0493@e.ntu.edu.sg>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2025-07-26 13:48:51 -07:00 |
|
Minho Ryu
|
bfb118c01e
|
fix bug when eos_ids==0 (#8315)
|
2025-07-23 23:18:47 -07:00 |
|
Xinyuan Tong
|
8430bfe3e9
|
[Refactor] simplify multimodal data processing (#8107)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-07-20 21:43:09 -07:00 |
|
GuoYipin
|
750838adc4
|
fix: fix the bug of loading Internvl3 (#8067)
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-07-20 22:22:54 +08:00 |
|
Lianmin Zheng
|
bb0e8a32b5
|
Clean up server args (#8161)
|
2025-07-19 11:32:52 -07:00 |
|
Haohui Mai
|
d918ab7985
|
Support NVFP4 quantized dense models on AMD CDNA2/CDNA3 GPUs (#7302)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>
|
2025-07-18 19:59:39 -07:00 |
|
jianan-gu
|
48c1fa7bb6
|
[CPU][Llama4] Fix Llama4 MoE inputs with "apply_router_weight_on_input" (#7889)
|
2025-07-17 21:43:25 -07:00 |
|
Hanming Lu
|
9379da77de
|
SWA Prefix Cache (#7367)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2025-07-13 12:31:07 -07:00 |
|
Atream
|
615553079d
|
Support Kimi K2 (#7940)
|
2025-07-11 00:02:21 -07:00 |
|
ronnie_zheng
|
766392c6bd
|
[feature]Ascend quantization support (#7791)
Co-authored-by: ichernob <ichernobnn@gmail.com>
Co-authored-by: liupeng <liupeng374@huawei.com>
|
2025-07-10 09:17:37 -07:00 |
|
SijiaYang
|
cb9d91ea8a
|
feat: support DeepSeek-R1-W4AFP8 model with ep-moe mode (#7762)
Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com>
|
2025-07-07 14:47:21 -07:00 |
|