Commit Graph

2185 Commits

Author SHA1 Message Date
Chayenne
18bb216c28 Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982) 2025-02-28 23:57:17 -08:00
Chayenne
6b859e7ddd Docs: add special warning to engine docs (#3979) 2025-02-28 21:59:20 -08:00
Chayenne
930da877c4 rename FunctionCallReqInput to ParseFunctionCallReq (#3976) 2025-02-28 18:46:25 -08:00
Chayenne
3f8a441437 Docs: Add redline to highlight main process (#3977) 2025-02-28 18:37:15 -08:00
Chayenne
aceb420179 Docs: add type hint to smapling parameters (#3975) 2025-02-28 18:21:20 -08:00
Baizhou Zhang
90a4b7d98a [Feature]Support ragged prefill in flashinfer mla backend (#3967)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
2025-02-28 18:13:56 -08:00
Yineng Zhang
f3b99f73b3 update flashinfer-python version 2025-02-28 16:31:59 -08:00
Elfie Guo
9e74ee91da Update cutlass dependency (#3966) 2025-02-28 16:16:31 -08:00
Chaitanya Sri Krishna Lolla
77a6c9d229 Remove unused imports from rocm mla kernel. (#3963) 2025-02-28 10:01:08 -08:00
fzyzcjy
e3e0bc50a9 [Feature] SPMD for SGLang + Verl (#3852) 2025-02-28 09:53:10 -08:00
mlmz
bac414ab53 [Feature] integrate Structural Tag in xgrammar backend for function calling (#3566)
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
2025-02-27 23:33:41 -08:00
Chang Su
eec3f6d1eb [Bugfix] Fix tokenizer_manager not getting 400 when req is too long (#3678)
Co-authored-by: voidxb <unkown>
2025-02-27 22:59:43 -08:00
Chayenne
90bc26a813 set a strict sgl-kernel version (#3950) 2025-02-27 22:44:57 -08:00
Kebe
ec0a72c2d9 Fix bench_serving not recognizing OPENAI_API_KEY (#3870)
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-02-27 20:18:53 -08:00
yiakwy-xpu-ml-framework-team
1c96fa86cf [MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613) 2025-02-27 19:42:48 -08:00
KCFindstr
bc20e93f2d [feat] Add Vertex AI compatible prediction route for /generate (#3866) 2025-02-27 19:42:15 -08:00
Qiaolin Yu
d38878523d Fix the doc link for sampling params (#3861) 2025-02-27 13:31:43 -08:00
Yineng Zhang
564bdf29f7 upgrade flashinfer v0.2.2.post1 (#3934) 2025-02-27 09:53:48 -08:00
Yineng Zhang
5d86016855 revert "Docs: Reorngaize dpsk links #3900" (#3933) 2025-02-27 08:57:13 -08:00
Enrique Shockwave
d281587989 Improve: Support xgrammar 0.1.14 (#3593) 2025-02-27 08:42:54 -08:00
laixin
b0df5d240b Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
2025-02-27 10:59:46 +00:00
Baizhou Zhang
3e02526b1f [Doc] Add experimental tag for flashinfer mla (#3925) 2025-02-27 01:55:36 -08:00
Stefan He
d8a98a2cad [Docs] Improve DPSK docs in dark mode (#3914) 2025-02-27 00:13:04 -08:00
Qing
0519269d20 [Docs] Disable notebook CI when merge to main (#3905) 2025-02-26 22:13:33 -08:00
Qiaolin Yu
d6898dd253 Add return hidden state in the native API (#3897)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 22:06:54 -08:00
Baizhou Zhang
71ed01833d [doc] Update document for flashinfer mla (#3907) 2025-02-26 20:40:45 -08:00
Tianxing Wu
8b681d7724 [Rocm] Fix to the rocm_mla_decode_rope.py returning random result (#3898) 2025-02-26 17:05:30 -08:00
ybyang
194eea1774 [doc] update sponsorship (#3903) 2025-02-26 16:28:15 -08:00
simveit
acd1a15921 Docs: Implemented frontend docs (#3791)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 15:30:05 -08:00
Chayenne
7c1692aa90 Docs: Reorngaize dpsk links (#3900) 2025-02-26 15:16:31 -08:00
Chayenne
8f019c7d1a Docs: Move dpsk docs forward a step (#3894) 2025-02-26 11:43:20 -08:00
JC1DA
7551498a69 [Feature] Support llguidance for constrained decoding (#3298) 2025-02-26 10:41:49 -08:00
simveit
44a2c4bd56 Docs: improve link to docs (#3860)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 10:29:25 -08:00
simveit
c9fc4a9d26 Docs: delete sgl-kernel install in docs (#3845) 2025-02-26 09:25:43 -08:00
lukec
21463e321a Expert Parallelism (EP) Support for DeepSeek V3/R1 (#3602)
Co-authored-by: laixin <xielx@shanghaitech.edu.cn>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: laixin <q865809639@gmail.com>
2025-02-26 02:29:37 -08:00
Shenggui Li
3dc9ff3ce8 [doc] fixed dpsk quant faq (#3865) 2025-02-25 19:40:47 -08:00
Shenggui Li
06427dfab1 [doc] added quantization doc for dpsk (#3843) 2025-02-25 09:43:28 -08:00
Kebe
60524920ba [Bug]: Fix maximum recursion depth triggered on exception exit (#3519) 2025-02-25 09:39:38 -08:00
IAN
107710268a [BugFix] Fix crash when receive a req with structed output in DP attention mode. (#3841) 2025-02-25 09:32:05 -08:00
who who who
4606e2a3fe Bug: fix capture_bs (#3857) 2025-02-25 08:40:35 -08:00
Nicolas Castet
127998cc41 Fix allgather ops inside cuda graphs (#3709) 2025-02-25 08:39:10 -08:00
Shenggui Li
c0bb9eb3b3 [improve] made timeout configurable (#3803) 2025-02-25 00:26:08 -08:00
Yueyang Pan
7036d6fc67 [Bug]: Add missing clamp to llavavid (#3787) 2025-02-24 19:10:15 -08:00
Chaitanya Sri Krishna Lolla
6ce9dbe828 [ROCm] Enable Fused MLA Triton kernel for DeepSeekV3 (#3237)
Co-authored-by: HAI <hixiao@gmail.com>
2025-02-24 18:14:31 -08:00
Yuanheng Zhao
3758d209a0 [Doc] Fix typo in server-argument description (#3641) 2025-02-24 16:57:13 -08:00
Wilson Wu
faf29e0b23 Docs: fix doc site copyright to current year (#3741) 2025-02-24 16:56:04 -08:00
He1pa
b0743ea059 Docs: fix dead link in router.md (#3799) 2025-02-24 16:53:57 -08:00
Wang Ran (汪然)
60b771c815 Improve: fix typos (#3801)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-24 16:51:23 -08:00
Lianmin Zheng
d7934cde45 Fix CI and install docs (#3821) 2025-02-24 16:17:38 -08:00
Lianmin Zheng
62bbd34393 Revert "Extract generation_manager from tokenizer_manager" (#3829) 2025-02-24 14:49:16 -08:00