Commit Graph

178 Commits

Author SHA1 Message Date
Yineng Zhang
066f4ec91f chore: bump v0.4.9.post1 (#7882) 2025-07-09 00:28:17 -07:00
Yineng Zhang
ec5f9c6269 chore: bump v0.4.9 (#7802) 2025-07-05 17:40:29 -07:00
Yuchen Cheng
1e3e3add3d fix(docs): fix the broken link in docs/references/production_metrics.md (#7741)
Signed-off-by: rudeigerc <rudeigerc@gmail.com>
2025-07-03 23:46:07 -07:00
Yi Zhang
93b6785d78 add description for llama4 eagle3 (#7688) 2025-07-01 01:19:19 -07:00
ybyang
7349717e4b [doc] update lws doc for pd (#7318) 2025-07-01 10:39:04 +08:00
tarinkk
eb6c2c1663 Hybrid kv cache for LLaMA4 (#6563)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: tarinkk <rt572@physics.rutger.edu>
Co-authored-by: tarinkk <rt572@rutgers.physics.edu>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
2025-06-27 18:58:55 -07:00
Yineng Zhang
69183f8808 chore: bump v0.4.8.post1 (#7559) 2025-06-26 02:21:12 -07:00
Yineng Zhang
7c3a12c000 chore: bump v0.4.8 (#7493) 2025-06-23 23:14:22 -07:00
Yineng Zhang
f9dc9dd28b chore: bump v0.4.7.post1 (#7248) 2025-06-16 15:20:29 -07:00
Lianmin Zheng
90bd3e32d6 Improve perf tuning docs (#7071) 2025-06-10 16:55:04 -07:00
Yineng Zhang
4f723edd3b chore: bump v0.4.7 (#7038) 2025-06-10 01:56:20 -07:00
Yueyang Pan
98c00a2df1 Fix torch profiler bugs for bench_offline_throughput.py (#6557) 2025-06-09 20:33:41 +08:00
HAI
b819381fec AITER backend extension and workload optimizations (#6838)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
2025-06-05 23:00:18 -07:00
Baizhou Zhang
791b3bfabb [Feature] Support Flashinfer fp8 blockwise GEMM kernel on Blackwell (#6479) 2025-05-28 16:03:43 -07:00
linzhuo
7a0bbe6a64 update toc for doc and dockerfile code style format (#6450)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-05-27 13:05:11 +08:00
fzyzcjy
25be63d0b2 Auto handle PD disaggregation in bench_serving (#6587)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-05-25 22:41:27 -07:00
simveit
e235be16fe Fix some issues with current docs. (#6588) 2025-05-26 01:04:34 +08:00
quinnrong94
2e4babdb0a [Feat] Support FlashMLA backend with MTP and FP8 KV cache (#6109)
Co-authored-by: Yingyi <yingyihuang2000@outlook.com>
Co-authored-by: neiltian <neiltian@tencent.com>
Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com>
Co-authored-by: kexueyu <kexueyu@tencent.com>
Co-authored-by: vincentmeng <vincentmeng@tencent.com>
Co-authored-by: pengmeng <pengmeng@tencent.com>
2025-05-15 00:48:09 -07:00
Brayden Zhong
3c32895cbe [Llama4] Add docs note about enable multimodal (#6235) 2025-05-13 10:05:47 +08:00
Lianmin Zheng
e8e18dcdcc Revert "fix some typos" (#6244) 2025-05-12 12:53:26 -07:00
Brayden Zhong
12319a6787 [Docs] Add docs for SGLANG_ and SGL_ environment variables (#6206) 2025-05-13 01:45:41 +08:00
applesaucethebun
d738ab52f8 fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-13 01:42:38 +08:00
Baizhou Zhang
8f508cc77f Update doc for MLA attention backends (#6034) 2025-05-07 18:51:05 -07:00
Chang Su
170d1f218a feat: Refactor DeepSeekV3 function call (#5908) 2025-05-01 21:28:57 -07:00
Ke Bao
ebaba85655 Update ci test and doc for MTP api change (#5952) 2025-05-01 09:30:27 -07:00
Huapeng Zhou
86317c09e9 [Docs] update grafana setup guide in production metrics (#5643)
Co-authored-by: NoahM <88418672+zhudianGG@users.noreply.github.com>
2025-04-27 15:36:33 -07:00
Frankey_8080
a21ef36352 support for the DeepSeek model by enabling streaming response parsing (#5592) 2025-04-26 18:59:31 -07:00
Lianmin Zheng
155890e4d1 [Minor] fix documentations (#5756) 2025-04-26 17:48:43 -07:00
Baizhou Zhang
ce5412b62e Turn on DeepGemm By Default and Update Doc (#5628) 2025-04-22 16:10:08 -07:00
Huapeng Zhou
57131dd955 [Feat.] Enable grafana to show metrics (#4718)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-04-21 00:43:42 -07:00
Yi Zhou
fac17acf08 add function call parser for DeepSeek V3 (#5224) 2025-04-20 17:38:08 -07:00
fzyzcjy
9c43477710 Super tiny fix typo (#5559) 2025-04-20 14:21:18 -07:00
Baizhou Zhang
b54b5a96e4 [Doc]Add instruction for profiling with bench_one_batch (#5581) 2025-04-20 14:05:36 -07:00
Baizhou Zhang
6fb29ffd9e Deprecate enable-flashinfer-mla and enable-flashmla (#5480) 2025-04-17 01:43:33 -07:00
Baizhou Zhang
4fb05583ef Deprecate disable-mla (#5481) 2025-04-17 01:43:14 -07:00
Xiaoyu Zhang
06a1656e02 [doc] Update benchmark_and_profiling.md (#5449) 2025-04-15 23:27:34 -07:00
Baizhou Zhang
a42736bbb8 Support MHA with chunked prefix cache for DeepSeek chunked prefill (#5113) 2025-04-15 22:01:22 -07:00
Baizhou Zhang
f6772f1497 [Fix] Turn off DeepGEMM by default (#5263) 2025-04-14 17:45:44 -07:00
Adarsh Shirawalmath
a0a9f6d64f [Docs] Remove the older supported docs section (#5301) 2025-04-11 11:30:18 -07:00
Kay Yan
f2b70afde0 docs: remove the use of Downward API for LWS_WORKER_INDEX (#5110)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
2025-04-08 20:46:11 -07:00
Ke Bao
ade714a67f Add Llama4 user guide (#5133)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
2025-04-07 19:09:34 -07:00
Chang Su
f04c80dc42 Add Llama4 support (#5092)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: ispobock <ispobaoke@163.com>
2025-04-07 00:29:36 -07:00
Baizhou Zhang
efbae697b3 [Revision] Replace enable_flashinfer_mla argument with attention_backend (#5052) 2025-04-05 01:23:02 -07:00
Lianmin Zheng
74885a848b Revert "Replace enable_flashinfer_mla argument with attention_backend" (#5048) 2025-04-03 13:30:56 -07:00
Baizhou Zhang
e8999b13b7 Replace enable_flashinfer_mla argument with attention_backend (#5005) 2025-04-03 02:53:58 -07:00
fzyzcjy
736502d4fd Tiny fix doc error (#4795) 2025-03-29 08:22:17 -07:00
Ke Bao
b39532587b Update doc for DeepSeek-V3-0324 (#4825) 2025-03-27 13:30:40 -07:00
Pan Lyu
c913ed4046 support clip embedding model (#4506) 2025-03-27 00:18:15 -07:00
Didier Durand
44f47d3ee1 Update supported_models.md: adding open-r1 Olympic Code 32B by HuggingFace (#4628) 2025-03-27 00:16:16 -07:00
Mick
1e86457c90 model: Minicpmo (#3023) 2025-03-24 20:08:40 -07:00