Commit Graph

498 Commits

Author SHA1 Message Date
fzyzcjy
f0653886a5 Expert distribution recording without overhead for EPLB (#4957) 2025-05-19 20:07:43 -07:00
simveit
506e5de8fe Improve supported models doc (#6430) 2025-05-20 01:43:35 +08:00
applesaucethebun
6dc6b30637 Add missing model to doc (#6396)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-18 12:57:58 -07:00
Vincent Zhong
e9ef39d2e9 docs: Update the MD files (#6373)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-17 09:23:16 -07:00
Kiv Chen
64825b8395 model(vlm): mistral 3.1 (#5099)
Co-authored-by: KivenChen <sleigh-queue-0y@icloud.com>
2025-05-16 18:36:18 -07:00
Yury Sulsky
f19a9204cd Support precomputed multimodal features for Qwen-VL and Gemma3 models. (#6136)
Co-authored-by: Yury Sulsky <ysulsky@tesla.com>
2025-05-16 12:26:15 -07:00
quinnrong94
2e4babdb0a [Feat] Support FlashMLA backend with MTP and FP8 KV cache (#6109)
Co-authored-by: Yingyi <yingyihuang2000@outlook.com>
Co-authored-by: neiltian <neiltian@tencent.com>
Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com>
Co-authored-by: kexueyu <kexueyu@tencent.com>
Co-authored-by: vincentmeng <vincentmeng@tencent.com>
Co-authored-by: pengmeng <pengmeng@tencent.com>
2025-05-15 00:48:09 -07:00
Brayden Zhong
9a91fa0ed1 docs: fix a bad redirect (#6300) 2025-05-14 10:27:19 -07:00
Mick
cd7c8a8de6 doc: update developer guide regarding mllms (#6138)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: XinyuanTong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-14 23:13:13 +08:00
Yineng Zhang
16267d4fa7 chore: bump v0.4.6.post4 (#6245) 2025-05-13 01:57:51 -07:00
Kiv Chen
5380cd7ea3 model(vlm): pixtral (#5084) 2025-05-13 00:16:10 -07:00
Brayden Zhong
3c32895cbe [Llama4] Add docs note about enable multimodal (#6235) 2025-05-13 10:05:47 +08:00
Lianmin Zheng
e8e18dcdcc Revert "fix some typos" (#6244) 2025-05-12 12:53:26 -07:00
Brayden Zhong
12319a6787 [Docs] Add docs for SGLANG_ and SGL_ environment variables (#6206) 2025-05-13 01:45:41 +08:00
applesaucethebun
d738ab52f8 fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-13 01:42:38 +08:00
Cheng Wan
25c83fff6a Performing Vocabulary Parallelism for LM Head across Attention TP Groups (#5558)
Co-authored-by: liusy58 <liusy58@linux.alibaba.com>
2025-05-11 23:36:29 -07:00
Lianmin Zheng
01bdbf7f80 Improve structured outputs: fix race condition, server crash, metrics and style (#6188) 2025-05-11 08:36:16 -07:00
Adarsh Shirawalmath
94d42b6794 [Docs] minor Qwen3 and reasoning parser docs fix (#6032) 2025-05-11 08:22:46 -07:00
mlmz
69276f619a doc: fix the erroneous documents and example codes about Alibaba-NLP/gme-Qwen2-VL-2B-Instruct (#6199) 2025-05-11 08:22:11 -07:00
applesaucethebun
2ce8793519 Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 12:55:00 +08:00
Yineng Zhang
66fc63d6b1 Revert "feat: add thinking_budget (#6089)" (#6181) 2025-05-10 16:07:45 -07:00
Ximingwang-09
921e4a8185 [Docs]Delete duplicate content (#6146)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
2025-05-10 15:02:15 -07:00
XinyuanTong
9d8ec2e67e Fix and Clean up chat-template requirement for VLM (#6114)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-11 00:14:09 +08:00
Yineng Zhang
678d8cc987 chore: bump v0.4.6.post3 (#6165) 2025-05-09 15:38:47 -07:00
thyecust
63484f9fd6 feat: add thinking_budget (#6089) 2025-05-09 08:22:09 -07:00
Zhu Chen
fa7d7fd9e5 [Feature] Add FlashAttention3 as a backend for VisionAttention (#5764)
Co-authored-by: othame <chenzhu_912@zju.edu.cn>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
2025-05-08 10:01:19 -07:00
Baizhou Zhang
8f508cc77f Update doc for MLA attention backends (#6034) 2025-05-07 18:51:05 -07:00
Baizhou Zhang
fee37d9e8d [Doc]Fix description for dp_size argument (#6063) 2025-05-08 00:04:22 +08:00
mlmz
a68ed76682 feat: append more comprehensive fields in messages instead of merely role and content (#5996) 2025-05-05 11:43:34 -07:00
Wenxuan Tan
22da3d978f Fix "Avoid computing lse in Ragged Prefill when there's no prefix match" (#5555) 2025-05-05 10:32:17 -07:00
Lifu Huang
1232f7e8b7 Update dev container config to support live code sync and improve docker setup guide (#6018)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-04 22:33:46 -07:00
vzed
95c231e50d Tool Call: Add chat_template_kwargs documentation (#5679) 2025-05-04 13:12:40 -07:00
Chayenne
73dcf2b326 Remove token in token out in Native API (#5967) 2025-05-01 21:59:43 -07:00
Chang Su
170d1f218a feat: Refactor DeepSeekV3 function call (#5908) 2025-05-01 21:28:57 -07:00
江家瑋
ad506a4e6b docs: Fix Qwen model typo (#5944)
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
2025-05-01 10:23:00 -07:00
Ke Bao
ebaba85655 Update ci test and doc for MTP api change (#5952) 2025-05-01 09:30:27 -07:00
Yineng Zhang
9858113c33 chore: bump v0.4.6.post2 (#5939) 2025-04-30 22:04:40 -07:00
liwenju0
8fefdd32c7 [Feature] add support kimi vl model (#5383)
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
2025-04-29 21:31:19 -07:00
Baizhou Zhang
799789afed Bump Flashinfer to 0.2.5 (#5870)
Co-authored-by: Yuhao Chen <yxckeis8@gmail.com>
2025-04-29 19:50:57 -07:00
Chang Su
2b06484bd1 feat: support pythonic tool call and index in tool call streaming (#5725) 2025-04-29 17:30:44 -07:00
simveit
ae523675e5 [Doc] Tables instead of bulletpoints for sampling doc (#5841) 2025-04-29 13:49:39 -07:00
Adarsh Shirawalmath
5c08aa4958 [Docs] Update docs for Qwen3 and Qwen3MoE (#5836) 2025-04-29 13:48:30 -07:00
Qiaolin Yu
8c0cfca87d Feat: support cuda graph for LoRA (#4115)
Co-authored-by: Beichen Ma <mabeichen12@gmail.com>
2025-04-28 23:30:44 -07:00
Yineng Zhang
dcae1fb2cd chore: bump v0.4.6.post1 (#5845) 2025-04-28 12:57:08 -07:00
Lianmin Zheng
849c83a0c0 [CI] test chunked prefill more (#5798) 2025-04-28 10:57:17 -07:00
Baizhou Zhang
f48b007c1d [Doc] Recover history of server_arguments.md (#5851) 2025-04-28 10:48:21 -07:00
Michael Yao
966eb90865 [Docs] Replace lists with tables for cleanup and readability in server_arguments (#5276) 2025-04-28 00:36:10 -07:00
Trevor Morris
84810da4ae Add Cutlass MLA attention backend (#5390) 2025-04-27 20:58:53 -07:00
Huapeng Zhou
86317c09e9 [Docs] update grafana setup guide in production metrics (#5643)
Co-authored-by: NoahM <88418672+zhudianGG@users.noreply.github.com>
2025-04-27 15:36:33 -07:00
Baizhou Zhang
84022c0e56 Release v0.4.6 (#5795) 2025-04-27 14:07:05 -07:00