Lianmin Zheng
|
2d72fc47cf
|
Improve profiler and integrate profiler in bench_one_batch_server (#6787)
|
2025-05-31 15:53:55 -07:00 |
|
shangmingc
|
6cb00c6398
|
[PD] Optimize time out logic and add env var doc for mooncake (#6761)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-30 00:45:02 -07:00 |
|
Baizhou Zhang
|
791b3bfabb
|
[Feature] Support Flashinfer fp8 blockwise GEMM kernel on Blackwell (#6479)
|
2025-05-28 16:03:43 -07:00 |
|
Trevor Morris
|
e806f708c9
|
[PD] Make bootstrap code common between NIXL and Mooncake (#6473)
|
2025-05-27 12:47:38 -07:00 |
|
Vincent Zhong
|
45a31a82e4
|
docs: Update documentation to reflect xgrammar as default grammar backend (#6601)
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
|
2025-05-27 13:29:13 +08:00 |
|
Brayden Zhong
|
1aa0fbf416
|
Add note to add supported model to documentation (#6640)
|
2025-05-27 13:18:46 +08:00 |
|
linzhuo
|
7a0bbe6a64
|
update toc for doc and dockerfile code style format (#6450)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-05-27 13:05:11 +08:00 |
|
fzyzcjy
|
25be63d0b2
|
Auto handle PD disaggregation in bench_serving (#6587)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2025-05-25 22:41:27 -07:00 |
|
simveit
|
e235be16fe
|
Fix some issues with current docs. (#6588)
|
2025-05-26 01:04:34 +08:00 |
|
Yineng Zhang
|
7e257cd666
|
chore: bump v0.4.6.post5 (#6566)
|
2025-05-24 00:48:05 -07:00 |
|
Chang Su
|
ed0c3035cd
|
feat(Tool Calling): Support required and specific function mode (#6550)
|
2025-05-23 21:00:37 -07:00 |
|
ryang
|
a6ae3af15e
|
Support XiaomiMiMo inference with mtp (#6059)
|
2025-05-22 14:14:49 -07:00 |
|
Byron Hsu
|
7513558074
|
[PD] Add doc and simplify sender.send (#6019)
|
2025-05-21 21:22:21 -07:00 |
|
Wenxuan Tan
|
66324895c6
|
[docs] Fix torch version (#6472)
|
2025-05-20 10:53:14 -07:00 |
|
fzyzcjy
|
f0653886a5
|
Expert distribution recording without overhead for EPLB (#4957)
|
2025-05-19 20:07:43 -07:00 |
|
simveit
|
506e5de8fe
|
Improve supported models doc (#6430)
|
2025-05-20 01:43:35 +08:00 |
|
applesaucethebun
|
6dc6b30637
|
Add missing model to doc (#6396)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-18 12:57:58 -07:00 |
|
Vincent Zhong
|
e9ef39d2e9
|
docs: Update the MD files (#6373)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-17 09:23:16 -07:00 |
|
Kiv Chen
|
64825b8395
|
model(vlm): mistral 3.1 (#5099)
Co-authored-by: KivenChen <sleigh-queue-0y@icloud.com>
|
2025-05-16 18:36:18 -07:00 |
|
Yury Sulsky
|
f19a9204cd
|
Support precomputed multimodal features for Qwen-VL and Gemma3 models. (#6136)
Co-authored-by: Yury Sulsky <ysulsky@tesla.com>
|
2025-05-16 12:26:15 -07:00 |
|
quinnrong94
|
2e4babdb0a
|
[Feat] Support FlashMLA backend with MTP and FP8 KV cache (#6109)
Co-authored-by: Yingyi <yingyihuang2000@outlook.com>
Co-authored-by: neiltian <neiltian@tencent.com>
Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com>
Co-authored-by: kexueyu <kexueyu@tencent.com>
Co-authored-by: vincentmeng <vincentmeng@tencent.com>
Co-authored-by: pengmeng <pengmeng@tencent.com>
|
2025-05-15 00:48:09 -07:00 |
|
Brayden Zhong
|
9a91fa0ed1
|
docs: fix a bad redirect (#6300)
|
2025-05-14 10:27:19 -07:00 |
|
Mick
|
cd7c8a8de6
|
doc: update developer guide regarding mllms (#6138)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: XinyuanTong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-05-14 23:13:13 +08:00 |
|
Yineng Zhang
|
16267d4fa7
|
chore: bump v0.4.6.post4 (#6245)
|
2025-05-13 01:57:51 -07:00 |
|
Kiv Chen
|
5380cd7ea3
|
model(vlm): pixtral (#5084)
|
2025-05-13 00:16:10 -07:00 |
|
Brayden Zhong
|
3c32895cbe
|
[Llama4] Add docs note about enable multimodal (#6235)
|
2025-05-13 10:05:47 +08:00 |
|
Lianmin Zheng
|
e8e18dcdcc
|
Revert "fix some typos" (#6244)
|
2025-05-12 12:53:26 -07:00 |
|
Brayden Zhong
|
12319a6787
|
[Docs] Add docs for SGLANG_ and SGL_ environment variables (#6206)
|
2025-05-13 01:45:41 +08:00 |
|
applesaucethebun
|
d738ab52f8
|
fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-13 01:42:38 +08:00 |
|
Cheng Wan
|
25c83fff6a
|
Performing Vocabulary Parallelism for LM Head across Attention TP Groups (#5558)
Co-authored-by: liusy58 <liusy58@linux.alibaba.com>
|
2025-05-11 23:36:29 -07:00 |
|
Lianmin Zheng
|
01bdbf7f80
|
Improve structured outputs: fix race condition, server crash, metrics and style (#6188)
|
2025-05-11 08:36:16 -07:00 |
|
Adarsh Shirawalmath
|
94d42b6794
|
[Docs] minor Qwen3 and reasoning parser docs fix (#6032)
|
2025-05-11 08:22:46 -07:00 |
|
mlmz
|
69276f619a
|
doc: fix the erroneous documents and example codes about Alibaba-NLP/gme-Qwen2-VL-2B-Instruct (#6199)
|
2025-05-11 08:22:11 -07:00 |
|
applesaucethebun
|
2ce8793519
|
Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 12:55:00 +08:00 |
|
Yineng Zhang
|
66fc63d6b1
|
Revert "feat: add thinking_budget (#6089)" (#6181)
|
2025-05-10 16:07:45 -07:00 |
|
Ximingwang-09
|
921e4a8185
|
[Docs]Delete duplicate content (#6146)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
|
2025-05-10 15:02:15 -07:00 |
|
XinyuanTong
|
9d8ec2e67e
|
Fix and Clean up chat-template requirement for VLM (#6114)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-05-11 00:14:09 +08:00 |
|
Yineng Zhang
|
678d8cc987
|
chore: bump v0.4.6.post3 (#6165)
|
2025-05-09 15:38:47 -07:00 |
|
thyecust
|
63484f9fd6
|
feat: add thinking_budget (#6089)
|
2025-05-09 08:22:09 -07:00 |
|
Zhu Chen
|
fa7d7fd9e5
|
[Feature] Add FlashAttention3 as a backend for VisionAttention (#5764)
Co-authored-by: othame <chenzhu_912@zju.edu.cn>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
|
2025-05-08 10:01:19 -07:00 |
|
Baizhou Zhang
|
8f508cc77f
|
Update doc for MLA attention backends (#6034)
|
2025-05-07 18:51:05 -07:00 |
|
Baizhou Zhang
|
fee37d9e8d
|
[Doc]Fix description for dp_size argument (#6063)
|
2025-05-08 00:04:22 +08:00 |
|
mlmz
|
a68ed76682
|
feat: append more comprehensive fields in messages instead of merely role and content (#5996)
|
2025-05-05 11:43:34 -07:00 |
|
Wenxuan Tan
|
22da3d978f
|
Fix "Avoid computing lse in Ragged Prefill when there's no prefix match" (#5555)
|
2025-05-05 10:32:17 -07:00 |
|
Lifu Huang
|
1232f7e8b7
|
Update dev container config to support live code sync and improve docker setup guide (#6018)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
|
2025-05-04 22:33:46 -07:00 |
|
vzed
|
95c231e50d
|
Tool Call: Add chat_template_kwargs documentation (#5679)
|
2025-05-04 13:12:40 -07:00 |
|
Chayenne
|
73dcf2b326
|
Remove token in token out in Native API (#5967)
|
2025-05-01 21:59:43 -07:00 |
|
Chang Su
|
170d1f218a
|
feat: Refactor DeepSeekV3 function call (#5908)
|
2025-05-01 21:28:57 -07:00 |
|
江家瑋
|
ad506a4e6b
|
docs: Fix Qwen model typo (#5944)
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
|
2025-05-01 10:23:00 -07:00 |
|
Ke Bao
|
ebaba85655
|
Update ci test and doc for MTP api change (#5952)
|
2025-05-01 09:30:27 -07:00 |
|