Yineng Zhang
|
2272c2a5b5
|
chore: bump v0.4.9.post4 (#8305)
|
2025-07-25 17:12:47 -07:00 |
|
Xiaoyu Zhang
|
9045cc1eb8
|
[torch.compile bug] avoid biased_grouped_topk_impl func repeatedly triggering torch.compile in forward pass (#8353)
|
2025-07-25 21:17:47 +08:00 |
|
Zaili Wang
|
15d2759174
|
[CPU] Add tutorial docs for SGL on CPU (#8000)
|
2025-07-25 00:03:16 -07:00 |
|
Yineng Zhang
|
01c000043c
|
chore: bump v0.4.9.post3 (#8265)
|
2025-07-22 15:55:48 -07:00 |
|
Yineng Zhang
|
22bd857cb5
|
docs: update README (#7985)
|
2025-07-12 13:31:11 -07:00 |
|
Yineng Zhang
|
eb118d88c4
|
chore: bump v0.4.9.post2 (#7963)
|
2025-07-11 21:11:20 -07:00 |
|
Yineng Zhang
|
066f4ec91f
|
chore: bump v0.4.9.post1 (#7882)
|
2025-07-09 00:28:17 -07:00 |
|
Yineng Zhang
|
ec5f9c6269
|
chore: bump v0.4.9 (#7802)
|
2025-07-05 17:40:29 -07:00 |
|
Yuchen Cheng
|
1e3e3add3d
|
fix(docs): fix the broken link in docs/references/production_metrics.md (#7741)
Signed-off-by: rudeigerc <rudeigerc@gmail.com>
|
2025-07-03 23:46:07 -07:00 |
|
Yi Zhang
|
93b6785d78
|
add description for llama4 eagle3 (#7688)
|
2025-07-01 01:19:19 -07:00 |
|
ybyang
|
7349717e4b
|
[doc] update lws doc for pd (#7318)
|
2025-07-01 10:39:04 +08:00 |
|
tarinkk
|
eb6c2c1663
|
Hybrid kv cache for LLaMA4 (#6563)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: tarinkk <rt572@physics.rutger.edu>
Co-authored-by: tarinkk <rt572@rutgers.physics.edu>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2025-06-27 18:58:55 -07:00 |
|
Yineng Zhang
|
69183f8808
|
chore: bump v0.4.8.post1 (#7559)
|
2025-06-26 02:21:12 -07:00 |
|
Yineng Zhang
|
7c3a12c000
|
chore: bump v0.4.8 (#7493)
|
2025-06-23 23:14:22 -07:00 |
|
Yineng Zhang
|
f9dc9dd28b
|
chore: bump v0.4.7.post1 (#7248)
|
2025-06-16 15:20:29 -07:00 |
|
Lianmin Zheng
|
90bd3e32d6
|
Improve perf tuning docs (#7071)
|
2025-06-10 16:55:04 -07:00 |
|
Yineng Zhang
|
4f723edd3b
|
chore: bump v0.4.7 (#7038)
|
2025-06-10 01:56:20 -07:00 |
|
Yueyang Pan
|
98c00a2df1
|
Fix torch profiler bugs for bench_offline_throughput.py (#6557)
|
2025-06-09 20:33:41 +08:00 |
|
HAI
|
b819381fec
|
AITER backend extension and workload optimizations (#6838)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
|
2025-06-05 23:00:18 -07:00 |
|
Baizhou Zhang
|
791b3bfabb
|
[Feature] Support Flashinfer fp8 blockwise GEMM kernel on Blackwell (#6479)
|
2025-05-28 16:03:43 -07:00 |
|
linzhuo
|
7a0bbe6a64
|
update toc for doc and dockerfile code style format (#6450)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-05-27 13:05:11 +08:00 |
|
fzyzcjy
|
25be63d0b2
|
Auto handle PD disaggregation in bench_serving (#6587)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2025-05-25 22:41:27 -07:00 |
|
simveit
|
e235be16fe
|
Fix some issues with current docs. (#6588)
|
2025-05-26 01:04:34 +08:00 |
|
quinnrong94
|
2e4babdb0a
|
[Feat] Support FlashMLA backend with MTP and FP8 KV cache (#6109)
Co-authored-by: Yingyi <yingyihuang2000@outlook.com>
Co-authored-by: neiltian <neiltian@tencent.com>
Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com>
Co-authored-by: kexueyu <kexueyu@tencent.com>
Co-authored-by: vincentmeng <vincentmeng@tencent.com>
Co-authored-by: pengmeng <pengmeng@tencent.com>
|
2025-05-15 00:48:09 -07:00 |
|
Brayden Zhong
|
3c32895cbe
|
[Llama4] Add docs note about enable multimodal (#6235)
|
2025-05-13 10:05:47 +08:00 |
|
Lianmin Zheng
|
e8e18dcdcc
|
Revert "fix some typos" (#6244)
|
2025-05-12 12:53:26 -07:00 |
|
Brayden Zhong
|
12319a6787
|
[Docs] Add docs for SGLANG_ and SGL_ environment variables (#6206)
|
2025-05-13 01:45:41 +08:00 |
|
applesaucethebun
|
d738ab52f8
|
fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-13 01:42:38 +08:00 |
|
Baizhou Zhang
|
8f508cc77f
|
Update doc for MLA attention backends (#6034)
|
2025-05-07 18:51:05 -07:00 |
|
Chang Su
|
170d1f218a
|
feat: Refactor DeepSeekV3 function call (#5908)
|
2025-05-01 21:28:57 -07:00 |
|
Ke Bao
|
ebaba85655
|
Update ci test and doc for MTP api change (#5952)
|
2025-05-01 09:30:27 -07:00 |
|
Huapeng Zhou
|
86317c09e9
|
[Docs] update grafana setup guide in production metrics (#5643)
Co-authored-by: NoahM <88418672+zhudianGG@users.noreply.github.com>
|
2025-04-27 15:36:33 -07:00 |
|
Frankey_8080
|
a21ef36352
|
support for the DeepSeek model by enabling streaming response parsing (#5592)
|
2025-04-26 18:59:31 -07:00 |
|
Lianmin Zheng
|
155890e4d1
|
[Minor] fix documentations (#5756)
|
2025-04-26 17:48:43 -07:00 |
|
Baizhou Zhang
|
ce5412b62e
|
Turn on DeepGemm By Default and Update Doc (#5628)
|
2025-04-22 16:10:08 -07:00 |
|
Huapeng Zhou
|
57131dd955
|
[Feat.] Enable grafana to show metrics (#4718)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-04-21 00:43:42 -07:00 |
|
Yi Zhou
|
fac17acf08
|
add function call parser for DeepSeek V3 (#5224)
|
2025-04-20 17:38:08 -07:00 |
|
fzyzcjy
|
9c43477710
|
Super tiny fix typo (#5559)
|
2025-04-20 14:21:18 -07:00 |
|
Baizhou Zhang
|
b54b5a96e4
|
[Doc]Add instruction for profiling with bench_one_batch (#5581)
|
2025-04-20 14:05:36 -07:00 |
|
Baizhou Zhang
|
6fb29ffd9e
|
Deprecate enable-flashinfer-mla and enable-flashmla (#5480)
|
2025-04-17 01:43:33 -07:00 |
|
Baizhou Zhang
|
4fb05583ef
|
Deprecate disable-mla (#5481)
|
2025-04-17 01:43:14 -07:00 |
|
Xiaoyu Zhang
|
06a1656e02
|
[doc] Update benchmark_and_profiling.md (#5449)
|
2025-04-15 23:27:34 -07:00 |
|
Baizhou Zhang
|
a42736bbb8
|
Support MHA with chunked prefix cache for DeepSeek chunked prefill (#5113)
|
2025-04-15 22:01:22 -07:00 |
|
Baizhou Zhang
|
f6772f1497
|
[Fix] Turn off DeepGEMM by default (#5263)
|
2025-04-14 17:45:44 -07:00 |
|
Adarsh Shirawalmath
|
a0a9f6d64f
|
[Docs] Remove the older supported docs section (#5301)
|
2025-04-11 11:30:18 -07:00 |
|
Kay Yan
|
f2b70afde0
|
docs: remove the use of Downward API for LWS_WORKER_INDEX (#5110)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
|
2025-04-08 20:46:11 -07:00 |
|
Ke Bao
|
ade714a67f
|
Add Llama4 user guide (#5133)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-04-07 19:09:34 -07:00 |
|
Chang Su
|
f04c80dc42
|
Add Llama4 support (#5092)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: ispobock <ispobaoke@163.com>
|
2025-04-07 00:29:36 -07:00 |
|
Baizhou Zhang
|
efbae697b3
|
[Revision] Replace enable_flashinfer_mla argument with attention_backend (#5052)
|
2025-04-05 01:23:02 -07:00 |
|
Lianmin Zheng
|
74885a848b
|
Revert "Replace enable_flashinfer_mla argument with attention_backend" (#5048)
|
2025-04-03 13:30:56 -07:00 |
|