Commit Graph

574 Commits

Author SHA1 Message Date
Qiaolin Yu
484d0e021d doc: add bench_one_batch_server in the benchmark doc (#8441) 2025-07-27 23:07:54 -07:00
Qiaolin Yu
2810338401 [feat] Support different attention backends for prefill and decode (#6338)
Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-07-28 11:42:29 +08:00
Kevin Xiang Li
44d600cd67 Support precomputed_embeddings for Llama 4 (#8156)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xiang (Kevin) Li <lik@nvidia.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-07-27 01:14:49 -07:00
Yineng Zhang
2272c2a5b5 chore: bump v0.4.9.post4 (#8305) 2025-07-25 17:12:47 -07:00
Chang Su
d8ee15643b [Feat] Add reasoning parser for Qwen/Qwen3-235B-A22B-Thinking-2507 (#8363) 2025-07-25 14:59:42 -07:00
Xiaoyu Zhang
9045cc1eb8 [torch.compile bug] avoid biased_grouped_topk_impl func repeatedly triggering torch.compile in forward pass (#8353) 2025-07-25 21:17:47 +08:00
Zaili Wang
15d2759174 [CPU] Add tutorial docs for SGL on CPU (#8000) 2025-07-25 00:03:16 -07:00
Yineng Zhang
01c000043c chore: bump v0.4.9.post3 (#8265) 2025-07-22 15:55:48 -07:00
Xinyuan Tong
8430bfe3e9 [Refactor] simplify multimodal data processing (#8107)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-07-20 21:43:09 -07:00
Praneth Paruchuri
83c104b188 Feat: Support for Persimmon Model (#7983) 2025-07-19 23:07:47 -07:00
Lifu Huang
4e3defe5a7 Support start up LoRA server without initial adapters (#8019) 2025-07-19 15:38:09 -07:00
Lianmin Zheng
bb0e8a32b5 Clean up server args (#8161) 2025-07-19 11:32:52 -07:00
Binyao Jiang
b7e951a6db Feat: Support audio in Phi4-mm model (#8048) 2025-07-18 21:03:53 -07:00
Lianmin Zheng
9c7a46180c [Doc] Steps to add a new attention backend (#8155) 2025-07-18 16:38:26 -07:00
Minglei Zhu
8a32355704 Feat: Support Granite 3.0 MoE in SGLang (#7959) 2025-07-17 20:56:03 -07:00
Praneth Paruchuri
cb736df854 Support for Phi-1.5 & Phi-2 models (#7862) 2025-07-13 18:43:40 -07:00
Lifu Huang
e2ed9d049a Refactor dynamic LoRA update to fix incorrect handling of variant weight shapes (#7844) 2025-07-13 18:36:01 -07:00
Yineng Zhang
22bd857cb5 docs: update README (#7985) 2025-07-12 13:31:11 -07:00
Yineng Zhang
eb118d88c4 chore: bump v0.4.9.post2 (#7963) 2025-07-11 21:11:20 -07:00
ronnie_zheng
86044712c6 [feature] kv transfer support of ascend npu (#7795)
Co-authored-by: liupeng <liupeng374@huawei.com>
2025-07-11 00:07:51 -07:00
Atream
615553079d Support Kimi K2 (#7940) 2025-07-11 00:02:21 -07:00
Binyao Jiang
2d54d4bb64 Feat: Support Phi-3.5-MoE in SGLang (#7907) 2025-07-09 23:51:33 -07:00
Yineng Zhang
066f4ec91f chore: bump v0.4.9.post1 (#7882) 2025-07-09 00:28:17 -07:00
Yikai Zhang
0870232195 Update native_api doc to match the change in the get_model_info endpoint (#7660)
Co-authored-by: Lifu Huang <lifu.hlf@gmail.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
2025-07-08 21:05:58 -07:00
Shangming Cai
64c5907e12 [PD] Add guidance for prefill bootstrap timeout (#7846)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-07-08 21:00:34 -07:00
Xinyuan Tong
43e20c0647 Support Mimo-VL (#7579)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-07-08 14:00:25 -07:00
Yineng Zhang
ec5f9c6269 chore: bump v0.4.9 (#7802) 2025-07-05 17:40:29 -07:00
Yuchen Cheng
1e3e3add3d fix(docs): fix the broken link in docs/references/production_metrics.md (#7741)
Signed-off-by: rudeigerc <rudeigerc@gmail.com>
2025-07-03 23:46:07 -07:00
Xinyuan Tong
43f93f632c fix CI: update native api ipynb (#7754)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-07-03 15:25:00 -07:00
ronnie_zheng
1e0e549766 Ascend attention backend(PA&MLA) (#7722)
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: VDV1985 <vladdv85@mail.ru>
2025-07-03 09:23:19 -07:00
Yi Zhang
93b6785d78 add description for llama4 eagle3 (#7688) 2025-07-01 01:19:19 -07:00
ybyang
7349717e4b [doc] update lws doc for pd (#7318) 2025-07-01 10:39:04 +08:00
Lianmin Zheng
22352d47a9 Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632)
Co-authored-by: Kan Wu <wukanustc@gmail.com>
2025-06-29 23:16:19 -07:00
tarinkk
eb6c2c1663 Hybrid kv cache for LLaMA4 (#6563)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: tarinkk <rt572@physics.rutger.edu>
Co-authored-by: tarinkk <rt572@rutgers.physics.edu>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
2025-06-27 18:58:55 -07:00
Yineng Zhang
69183f8808 chore: bump v0.4.8.post1 (#7559) 2025-06-26 02:21:12 -07:00
Shangming Cai
5c2142579a [PD] Raise error for incompatible mooncake version and some minor fixes (#7527)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-06-25 18:55:24 -07:00
Yineng Zhang
7c3a12c000 chore: bump v0.4.8 (#7493) 2025-06-23 23:14:22 -07:00
Lianmin Zheng
30ceccc74a Update hyperparameter_tuning.md (#7454) 2025-06-22 22:42:55 -07:00
Chang Su
72676cd6c0 feat(oai refactor): Replace openai_api with entrypoints/openai (#7351)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
2025-06-21 13:21:06 -07:00
Jinn
ab74f8f09d Remove batches api in docs & example (#7400) 2025-06-20 19:46:31 -07:00
woodx
97011abc8a [Doc] add embedding rerank doc (#7364) 2025-06-19 21:53:54 -07:00
Yineng Zhang
fadf18fdd5 docs: update installation (#7366) 2025-06-19 12:00:19 -07:00
linzhuo
1de4db9bef update invalid link in doc (#7297) 2025-06-18 01:37:36 -07:00
Yijie Zhu
a39d928782 support qwen2 running on ascend npu device (#7022)
Co-authored-by: 刁莹煜 <diaoyingyu1@hisilicon.com>
2025-06-17 11:24:10 -07:00
Yineng Zhang
f9dc9dd28b chore: bump v0.4.7.post1 (#7248) 2025-06-16 15:20:29 -07:00
Lianmin Zheng
21615cc3fe Minor style and doc fix (#7228) 2025-06-16 01:03:13 -07:00
Lifu Huang
98538822d5 Add Phi-4-mm to supported VLM supported model list. (#7178) 2025-06-13 23:17:40 -07:00
Povilas Kanapickas
bd7cfbd2f8 [Fix] Reduce busy polling when scheduler is idle (#6026) 2025-06-12 14:58:22 -07:00
Lianmin Zheng
dbdf76ca98 Clean up docs for server args and sampling parameters (generated by grok) (#7076) 2025-06-10 19:55:42 -07:00
Ximingwang-09
f2a75a66c4 update doc (#7046)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
2025-06-11 10:02:01 +08:00