Chayenne
|
9b08d975a0
|
[docs] Refactor, remove compiled results and add gpt-oss (#9613)
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
|
2025-08-25 15:27:06 -07:00 |
|
Yineng Zhang
|
e3e97a120b
|
chore: bump v0.5.1.post2 (#9592)
|
2025-08-25 03:45:09 -07:00 |
|
Xinyuan Tong
|
ca4b86c564
|
fix: Update OpenAI client base URL in documentation (#9576)
|
2025-08-24 23:06:57 -07:00 |
|
Yineng Zhang
|
e0ab167db0
|
chore: bump v0.5.1.post1 (#9558)
|
2025-08-24 01:14:17 -07:00 |
|
Xiaotong Jiang
|
80425e59bb
|
[doc] deepseekv31 support (#9544)
|
2025-08-23 16:54:58 -07:00 |
|
Lianmin Zheng
|
97a38ee85b
|
Release 0.5.1 (#9533)
|
2025-08-23 07:09:26 -07:00 |
|
Xinyuan Tong
|
fedfe91c1a
|
[Docs] Add doc and quick demo for gpt-oss responses api & buildin tools (#9497)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-08-21 23:51:52 -07:00 |
|
Xinyuan Tong
|
13ec8d427e
|
[Docs]Update reasoning parser doc & fix outdated link (#9492)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-08-21 22:08:28 -07:00 |
|
Chayenne
|
05bd789791
|
[docs]: fix reasoning context in docs (#9483)
|
2025-08-21 20:04:12 -07:00 |
|
Xinyuan Tong
|
0b3a5b1151
|
Update reasoning parser doc (#9468)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-08-21 17:25:30 -07:00 |
|
Xinyuan Tong
|
e8449ab515
|
Add deepseek v3.1 thinking parser support and update docs (#9464)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-08-21 15:09:40 -07:00 |
|
Lifu Huang
|
b0980af89f
|
Support pinning adapter via server args. (#9249)
|
2025-08-20 16:25:01 -07:00 |
|
Lianmin Zheng
|
1ec9769753
|
[Docs] Update contribution guide (#9383)
|
2025-08-19 23:37:45 -07:00 |
|
Lianmin Zheng
|
f20b6a3f2b
|
[minor] Sync style changes (#9376)
|
2025-08-19 21:35:01 -07:00 |
|
Lianmin Zheng
|
ecc9f3e47a
|
[Minor] Fix the style of sgl-kernel (#9332)
|
2025-08-18 23:45:00 -07:00 |
|
Yineng Zhang
|
7e8187e004
|
docs: fix spec (#9326)
|
2025-08-18 19:35:46 -07:00 |
|
Lianmin Zheng
|
c480a3f6ea
|
Minor style fixes for sgl-kernel (#9289)
|
2025-08-18 09:38:35 -07:00 |
|
Netanel Haber
|
845d12a979
|
model: support nvidia/Llama-3_3-Nemotron-Super-49B-v1 (#9067)
Co-authored-by: Kyle Huang <kylhuang@nvidia.com>
|
2025-08-17 01:48:15 -07:00 |
|
Cheng Wan
|
295895120d
|
[6/N] MoE Refactor: Cleanup MoE-related configs (#8849)
|
2025-08-14 21:14:53 -07:00 |
|
Yineng Zhang
|
fab0f6e77d
|
chore: bump v0.5.0rc2 (#9203)
|
2025-08-14 16:11:16 -07:00 |
|
Chengxing Xie
|
c1c7dc4534
|
feat: Add model version tracking with API endpoints and response metadata (#8795)
|
2025-08-14 12:13:46 -07:00 |
|
Lianmin Zheng
|
9e426466af
|
Clean up allocators (#9134)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-13 13:56:04 -07:00 |
|
Zhihao Liu
|
65736dc524
|
[Model] Support Qwen3ForSequenceClassification for Qwen3-Embed Model (#7957)
|
2025-08-13 11:14:54 -07:00 |
|
jacky.cheng
|
25caa7a8a9
|
[AMD] Support Wave attention backend with AMD GPU optimizations (#8660)
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Harsh Menon <harsh@nod-labs.com>
Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
Co-authored-by: Harsh Menon <harsh@nod-labs.com>
Co-authored-by: Stanley Winata <stanley.winata@amd.com>
Co-authored-by: Stanley Winata <68087699+raikonenfnu@users.noreply.github.com>
Co-authored-by: Stanley Winata <stanley@nod-labs.com>
Co-authored-by: Ivan Butygin <ivan.butygin@gmail.com>
Co-authored-by: nithinsubbiah <nithinsubbiah@gmail.com>
Co-authored-by: Nithin Meganathan <18070964+nithinsubbiah@users.noreply.github.com>
Co-authored-by: Ivan Butygin <ibutygin@amd.com>
|
2025-08-12 13:49:11 -07:00 |
|
Hangzhi
|
03d114496f
|
Fix typos in supported models documentation (#9119)
|
2025-08-12 13:35:24 -07:00 |
|
li chaoran
|
2ecbd8b8bf
|
[feat] add ascend readme and docker release (#8700)
Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com>
Signed-off-by: lichaoran <pkwarcraft@gmail.com>
Co-authored-by: Even Zhou <even.y.zhou@outlook.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2025-08-12 13:25:42 -07:00 |
|
Simo Lin
|
1ce30dd13e
|
[router] update router documentation (#9121)
|
2025-08-12 13:16:34 -07:00 |
|
Yichao Cheng
|
fcc11e5ed5
|
update support new models doc (#9096)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-08-12 01:21:02 -07:00 |
|
Zhiqiang Xie
|
0eec4cb6cc
|
HiCache, add bench long context plus minor fixs (#9086)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-11 16:54:52 -07:00 |
|
Faraz
|
f508cd3cb7
|
TRTLLM-MLA FP8 path (#8638)
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
|
2025-08-11 14:02:13 -07:00 |
|
Lianmin Zheng
|
8c07fabda7
|
Update hyperparameter_tuning.md (#9083)
|
2025-08-11 13:44:11 -07:00 |
|
Liangsheng Yin
|
f9afa7dceb
|
Fix docs for clip max new tokens (#9082)
|
2025-08-11 13:15:21 -07:00 |
|
Jimmy
|
0d9e89ec69
|
[PD]decode: add CLIP_MAX_NEW_TOKEN for pop_preallocated (#8866)
|
2025-08-11 13:08:11 -07:00 |
|
Hangzhi
|
3d64fda376
|
Fix broken Kimi models HuggingFace link (#9080)
|
2025-08-11 12:15:00 -07:00 |
|
Baizhou Zhang
|
75e6a7cde1
|
Support radix cache for Lora feature (#7216)
|
2025-08-11 10:14:11 -07:00 |
|
Lianmin Zheng
|
2e8e7e353b
|
Improve docs and developer guide (#9044)
|
2025-08-10 21:05:18 -07:00 |
|
Lianmin Zheng
|
2449a0afe2
|
Refactor the docs (#9031)
|
2025-08-10 19:49:45 -07:00 |
|
Lifu Huang
|
f8a173bb50
|
Improve LoRA Perf by Deprecating FlashInfer and Eliminating Redundant Tensor Ops (#8940)
|
2025-08-10 01:04:45 -07:00 |
|
Binyao Jiang
|
f29aba8c6e
|
Support glm4.1v and glm4.5v (#8798)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Chang Su <csu272@usc.edu>
|
2025-08-09 00:59:13 -07:00 |
|
Lianmin Zheng
|
706bd69cc5
|
Clean up server_args.py to have a dedicated function for model specific adjustments (#8983)
|
2025-08-08 19:56:50 -07:00 |
|
Yineng Zhang
|
9020f7fc32
|
chore: bump v0.5.0rc0 (#8959)
|
2025-08-08 09:16:18 -07:00 |
|
Wenbo Yang
|
1132547496
|
Add ernie4.py for ERNIE-4.5 (#7657)
|
2025-08-08 00:55:48 -07:00 |
|
Xinyuan Tong
|
3fa3c6cd6a
|
Enables force reasoning based on chat template for Qwen3-Thinking (#8369)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Chang Su <csu272@usc.edu>
|
2025-08-06 20:02:47 -07:00 |
|
Lifu Huang
|
6210e2c4f0
|
Support GPU pinning for LoRA (#8697)
|
2025-08-06 19:39:45 -07:00 |
|
HouseWest
|
ca47e24f5d
|
[Feature] improve TBO: two chunk overlap (#8144)
|
2025-08-05 21:11:01 -07:00 |
|
Praneth Paruchuri
|
d26ca84f39
|
Support bailing moe (#8680)
|
2025-08-05 20:40:34 -07:00 |
|
Yineng Zhang
|
8cd344586e
|
chore: bump v0.4.10.post2 (#8727)
|
2025-08-03 03:43:29 -07:00 |
|
Guanhua Wang
|
f7b2853ff8
|
[feat] support minimum token load balance in dp attention (#7379)
|
2025-08-03 00:46:47 -07:00 |
|
Lifu Huang
|
8675bdf246
|
Support limiting max loaded loras in CPU. (#8650)
|
2025-08-03 00:02:23 -07:00 |
|
Nicolas Castet
|
82e6c3a65a
|
Add support for NCCL symmetric memory for TP allreduces (#8238)
|
2025-08-01 23:30:55 +00:00 |
|