Commit Graph

648 Commits

Author SHA1 Message Date
Liangsheng Yin
6e95f5e5bd Simplify Router arguments passing and build it in docker image (#9964) 2025-09-05 12:13:55 +08:00
Yineng Zhang
fa9c82d339 chore: bump v0.5.2rc2 (#10050) 2025-09-04 20:07:27 -07:00
Yingchun Lai
b32ab0705e metrics: support customer buckets for prompt/generation_tokens_histogram (#9634) 2025-09-04 22:22:08 +08:00
Huapeng Zhou
75ee00112d [Doc] Fix SGLang tool parser doc (#9886) 2025-09-04 21:52:53 +08:00
Lianmin Zheng
60e37f8028 Move parsers under a single folder (#9912) 2025-09-02 18:25:04 -07:00
Yineng Zhang
18f91eb639 chore: bump v0.5.2rc1 (#9920) 2025-09-02 04:43:34 -07:00
Lifu Huang
1fbfdebe6b [chore] fix dead links in doc (#9913) 2025-09-02 00:28:26 -07:00
Yineng Zhang
16e56ea693 chore: bump v0.5.2rc0 (#9862) 2025-09-01 03:07:36 -07:00
Zhiqiang Xie
001f51940a [HiCache] change the default policy to write through (#9772) 2025-08-28 18:28:39 -07:00
Yineng Zhang
bc80dc4ce0 chore: bump v0.5.1.post3 (#9716) 2025-08-27 15:42:42 -07:00
yhyang201
a85363c199 [docs] Instructions for bench_serving.py (#9071)
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-08-26 18:30:57 -07:00
Xiaotong Jiang
1a0896e9c0 [doc] add kimik2 --tool-call-parser (#9647) 2025-08-26 10:39:40 -07:00
Netanel Haber
4cd08dc592 model: Support nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 (#9301) 2025-08-26 15:33:40 +08:00
Chayenne
9b08d975a0 [docs] Refactor, remove compiled results and add gpt-oss (#9613)
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
2025-08-25 15:27:06 -07:00
Yineng Zhang
e3e97a120b chore: bump v0.5.1.post2 (#9592) 2025-08-25 03:45:09 -07:00
Xinyuan Tong
ca4b86c564 fix: Update OpenAI client base URL in documentation (#9576) 2025-08-24 23:06:57 -07:00
Yineng Zhang
e0ab167db0 chore: bump v0.5.1.post1 (#9558) 2025-08-24 01:14:17 -07:00
Xiaotong Jiang
80425e59bb [doc] deepseekv31 support (#9544) 2025-08-23 16:54:58 -07:00
Lianmin Zheng
97a38ee85b Release 0.5.1 (#9533) 2025-08-23 07:09:26 -07:00
Xinyuan Tong
fedfe91c1a [Docs] Add doc and quick demo for gpt-oss responses api & buildin tools (#9497)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-21 23:51:52 -07:00
Xinyuan Tong
13ec8d427e [Docs]Update reasoning parser doc & fix outdated link (#9492)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-21 22:08:28 -07:00
Chayenne
05bd789791 [docs]: fix reasoning context in docs (#9483) 2025-08-21 20:04:12 -07:00
Xinyuan Tong
0b3a5b1151 Update reasoning parser doc (#9468)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-21 17:25:30 -07:00
Xinyuan Tong
e8449ab515 Add deepseek v3.1 thinking parser support and update docs (#9464)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-21 15:09:40 -07:00
Lifu Huang
b0980af89f Support pinning adapter via server args. (#9249) 2025-08-20 16:25:01 -07:00
Lianmin Zheng
1ec9769753 [Docs] Update contribution guide (#9383) 2025-08-19 23:37:45 -07:00
Lianmin Zheng
f20b6a3f2b [minor] Sync style changes (#9376) 2025-08-19 21:35:01 -07:00
Lianmin Zheng
ecc9f3e47a [Minor] Fix the style of sgl-kernel (#9332) 2025-08-18 23:45:00 -07:00
Yineng Zhang
7e8187e004 docs: fix spec (#9326) 2025-08-18 19:35:46 -07:00
Lianmin Zheng
c480a3f6ea Minor style fixes for sgl-kernel (#9289) 2025-08-18 09:38:35 -07:00
Netanel Haber
845d12a979 model: support nvidia/Llama-3_3-Nemotron-Super-49B-v1 (#9067)
Co-authored-by: Kyle Huang <kylhuang@nvidia.com>
2025-08-17 01:48:15 -07:00
Cheng Wan
295895120d [6/N] MoE Refactor: Cleanup MoE-related configs (#8849) 2025-08-14 21:14:53 -07:00
Yineng Zhang
fab0f6e77d chore: bump v0.5.0rc2 (#9203) 2025-08-14 16:11:16 -07:00
Chengxing Xie
c1c7dc4534 feat: Add model version tracking with API endpoints and response metadata (#8795) 2025-08-14 12:13:46 -07:00
Lianmin Zheng
9e426466af Clean up allocators (#9134)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-13 13:56:04 -07:00
Zhihao Liu
65736dc524 [Model] Support Qwen3ForSequenceClassification for Qwen3-Embed Model (#7957) 2025-08-13 11:14:54 -07:00
jacky.cheng
25caa7a8a9 [AMD] Support Wave attention backend with AMD GPU optimizations (#8660)
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Harsh Menon <harsh@nod-labs.com>
Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
Co-authored-by: Harsh Menon <harsh@nod-labs.com>
Co-authored-by: Stanley Winata <stanley.winata@amd.com>
Co-authored-by: Stanley Winata <68087699+raikonenfnu@users.noreply.github.com>
Co-authored-by: Stanley Winata <stanley@nod-labs.com>
Co-authored-by: Ivan Butygin <ivan.butygin@gmail.com>
Co-authored-by: nithinsubbiah <nithinsubbiah@gmail.com>
Co-authored-by: Nithin Meganathan <18070964+nithinsubbiah@users.noreply.github.com>
Co-authored-by: Ivan Butygin <ibutygin@amd.com>
2025-08-12 13:49:11 -07:00
Hangzhi
03d114496f Fix typos in supported models documentation (#9119) 2025-08-12 13:35:24 -07:00
li chaoran
2ecbd8b8bf [feat] add ascend readme and docker release (#8700)
Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com>
Signed-off-by: lichaoran <pkwarcraft@gmail.com>
Co-authored-by: Even Zhou <even.y.zhou@outlook.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
2025-08-12 13:25:42 -07:00
Simo Lin
1ce30dd13e [router] update router documentation (#9121) 2025-08-12 13:16:34 -07:00
Yichao Cheng
fcc11e5ed5 update support new models doc (#9096)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-12 01:21:02 -07:00
Zhiqiang Xie
0eec4cb6cc HiCache, add bench long context plus minor fixs (#9086)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-11 16:54:52 -07:00
Faraz
f508cd3cb7 TRTLLM-MLA FP8 path (#8638)
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-08-11 14:02:13 -07:00
Lianmin Zheng
8c07fabda7 Update hyperparameter_tuning.md (#9083) 2025-08-11 13:44:11 -07:00
Liangsheng Yin
f9afa7dceb Fix docs for clip max new tokens (#9082) 2025-08-11 13:15:21 -07:00
Jimmy
0d9e89ec69 [PD]decode: add CLIP_MAX_NEW_TOKEN for pop_preallocated (#8866) 2025-08-11 13:08:11 -07:00
Hangzhi
3d64fda376 Fix broken Kimi models HuggingFace link (#9080) 2025-08-11 12:15:00 -07:00
Baizhou Zhang
75e6a7cde1 Support radix cache for Lora feature (#7216) 2025-08-11 10:14:11 -07:00
Lianmin Zheng
2e8e7e353b Improve docs and developer guide (#9044) 2025-08-10 21:05:18 -07:00
Lianmin Zheng
2449a0afe2 Refactor the docs (#9031) 2025-08-10 19:49:45 -07:00