Qiaolin Yu
|
484d0e021d
|
doc: add bench_one_batch_server in the benchmark doc (#8441)
|
2025-07-27 23:07:54 -07:00 |
|
Qiaolin Yu
|
2810338401
|
[feat] Support different attention backends for prefill and decode (#6338)
Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-07-28 11:42:29 +08:00 |
|
Kevin Xiang Li
|
44d600cd67
|
Support precomputed_embeddings for Llama 4 (#8156)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xiang (Kevin) Li <lik@nvidia.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-07-27 01:14:49 -07:00 |
|
Yineng Zhang
|
2272c2a5b5
|
chore: bump v0.4.9.post4 (#8305)
|
2025-07-25 17:12:47 -07:00 |
|
Chang Su
|
d8ee15643b
|
[Feat] Add reasoning parser for Qwen/Qwen3-235B-A22B-Thinking-2507 (#8363)
|
2025-07-25 14:59:42 -07:00 |
|
Xiaoyu Zhang
|
9045cc1eb8
|
[torch.compile bug] avoid biased_grouped_topk_impl func repeatedly triggering torch.compile in forward pass (#8353)
|
2025-07-25 21:17:47 +08:00 |
|
Zaili Wang
|
15d2759174
|
[CPU] Add tutorial docs for SGL on CPU (#8000)
|
2025-07-25 00:03:16 -07:00 |
|
Yineng Zhang
|
01c000043c
|
chore: bump v0.4.9.post3 (#8265)
|
2025-07-22 15:55:48 -07:00 |
|
Xinyuan Tong
|
8430bfe3e9
|
[Refactor] simplify multimodal data processing (#8107)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-07-20 21:43:09 -07:00 |
|
Praneth Paruchuri
|
83c104b188
|
Feat: Support for Persimmon Model (#7983)
|
2025-07-19 23:07:47 -07:00 |
|
Lifu Huang
|
4e3defe5a7
|
Support start up LoRA server without initial adapters (#8019)
|
2025-07-19 15:38:09 -07:00 |
|
Lianmin Zheng
|
bb0e8a32b5
|
Clean up server args (#8161)
|
2025-07-19 11:32:52 -07:00 |
|
Binyao Jiang
|
b7e951a6db
|
Feat: Support audio in Phi4-mm model (#8048)
|
2025-07-18 21:03:53 -07:00 |
|
Lianmin Zheng
|
9c7a46180c
|
[Doc] Steps to add a new attention backend (#8155)
|
2025-07-18 16:38:26 -07:00 |
|
Minglei Zhu
|
8a32355704
|
Feat: Support Granite 3.0 MoE in SGLang (#7959)
|
2025-07-17 20:56:03 -07:00 |
|
Praneth Paruchuri
|
cb736df854
|
Support for Phi-1.5 & Phi-2 models (#7862)
|
2025-07-13 18:43:40 -07:00 |
|
Lifu Huang
|
e2ed9d049a
|
Refactor dynamic LoRA update to fix incorrect handling of variant weight shapes (#7844)
|
2025-07-13 18:36:01 -07:00 |
|
Yineng Zhang
|
22bd857cb5
|
docs: update README (#7985)
|
2025-07-12 13:31:11 -07:00 |
|
Yineng Zhang
|
eb118d88c4
|
chore: bump v0.4.9.post2 (#7963)
|
2025-07-11 21:11:20 -07:00 |
|
ronnie_zheng
|
86044712c6
|
[feature] kv transfer support of ascend npu (#7795)
Co-authored-by: liupeng <liupeng374@huawei.com>
|
2025-07-11 00:07:51 -07:00 |
|
Atream
|
615553079d
|
Support Kimi K2 (#7940)
|
2025-07-11 00:02:21 -07:00 |
|
Binyao Jiang
|
2d54d4bb64
|
Feat: Support Phi-3.5-MoE in SGLang (#7907)
|
2025-07-09 23:51:33 -07:00 |
|
Yineng Zhang
|
066f4ec91f
|
chore: bump v0.4.9.post1 (#7882)
|
2025-07-09 00:28:17 -07:00 |
|
Yikai Zhang
|
0870232195
|
Update native_api doc to match the change in the get_model_info endpoint (#7660)
Co-authored-by: Lifu Huang <lifu.hlf@gmail.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2025-07-08 21:05:58 -07:00 |
|
Shangming Cai
|
64c5907e12
|
[PD] Add guidance for prefill bootstrap timeout (#7846)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-07-08 21:00:34 -07:00 |
|
Xinyuan Tong
|
43e20c0647
|
Support Mimo-VL (#7579)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-07-08 14:00:25 -07:00 |
|
Yineng Zhang
|
ec5f9c6269
|
chore: bump v0.4.9 (#7802)
|
2025-07-05 17:40:29 -07:00 |
|
Yuchen Cheng
|
1e3e3add3d
|
fix(docs): fix the broken link in docs/references/production_metrics.md (#7741)
Signed-off-by: rudeigerc <rudeigerc@gmail.com>
|
2025-07-03 23:46:07 -07:00 |
|
Xinyuan Tong
|
43f93f632c
|
fix CI: update native api ipynb (#7754)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-07-03 15:25:00 -07:00 |
|
ronnie_zheng
|
1e0e549766
|
Ascend attention backend(PA&MLA) (#7722)
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: VDV1985 <vladdv85@mail.ru>
|
2025-07-03 09:23:19 -07:00 |
|
Yi Zhang
|
93b6785d78
|
add description for llama4 eagle3 (#7688)
|
2025-07-01 01:19:19 -07:00 |
|
ybyang
|
7349717e4b
|
[doc] update lws doc for pd (#7318)
|
2025-07-01 10:39:04 +08:00 |
|
Lianmin Zheng
|
22352d47a9
|
Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632)
Co-authored-by: Kan Wu <wukanustc@gmail.com>
|
2025-06-29 23:16:19 -07:00 |
|
tarinkk
|
eb6c2c1663
|
Hybrid kv cache for LLaMA4 (#6563)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: tarinkk <rt572@physics.rutger.edu>
Co-authored-by: tarinkk <rt572@rutgers.physics.edu>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2025-06-27 18:58:55 -07:00 |
|
Yineng Zhang
|
69183f8808
|
chore: bump v0.4.8.post1 (#7559)
|
2025-06-26 02:21:12 -07:00 |
|
Shangming Cai
|
5c2142579a
|
[PD] Raise error for incompatible mooncake version and some minor fixes (#7527)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-06-25 18:55:24 -07:00 |
|
Yineng Zhang
|
7c3a12c000
|
chore: bump v0.4.8 (#7493)
|
2025-06-23 23:14:22 -07:00 |
|
Lianmin Zheng
|
30ceccc74a
|
Update hyperparameter_tuning.md (#7454)
|
2025-06-22 22:42:55 -07:00 |
|
Chang Su
|
72676cd6c0
|
feat(oai refactor): Replace openai_api with entrypoints/openai (#7351)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
|
2025-06-21 13:21:06 -07:00 |
|
Jinn
|
ab74f8f09d
|
Remove batches api in docs & example (#7400)
|
2025-06-20 19:46:31 -07:00 |
|
woodx
|
97011abc8a
|
[Doc] add embedding rerank doc (#7364)
|
2025-06-19 21:53:54 -07:00 |
|
Yineng Zhang
|
fadf18fdd5
|
docs: update installation (#7366)
|
2025-06-19 12:00:19 -07:00 |
|
linzhuo
|
1de4db9bef
|
update invalid link in doc (#7297)
|
2025-06-18 01:37:36 -07:00 |
|
Yijie Zhu
|
a39d928782
|
support qwen2 running on ascend npu device (#7022)
Co-authored-by: 刁莹煜 <diaoyingyu1@hisilicon.com>
|
2025-06-17 11:24:10 -07:00 |
|
Yineng Zhang
|
f9dc9dd28b
|
chore: bump v0.4.7.post1 (#7248)
|
2025-06-16 15:20:29 -07:00 |
|
Lianmin Zheng
|
21615cc3fe
|
Minor style and doc fix (#7228)
|
2025-06-16 01:03:13 -07:00 |
|
Lifu Huang
|
98538822d5
|
Add Phi-4-mm to supported VLM supported model list. (#7178)
|
2025-06-13 23:17:40 -07:00 |
|
Povilas Kanapickas
|
bd7cfbd2f8
|
[Fix] Reduce busy polling when scheduler is idle (#6026)
|
2025-06-12 14:58:22 -07:00 |
|
Lianmin Zheng
|
dbdf76ca98
|
Clean up docs for server args and sampling parameters (generated by grok) (#7076)
|
2025-06-10 19:55:42 -07:00 |
|
Ximingwang-09
|
f2a75a66c4
|
update doc (#7046)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
|
2025-06-11 10:02:01 +08:00 |
|