Chayenne
|
18bb216c28
|
Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982)
|
2025-02-28 23:57:17 -08:00 |
|
Chayenne
|
6b859e7ddd
|
Docs: add special warning to engine docs (#3979)
|
2025-02-28 21:59:20 -08:00 |
|
Chayenne
|
930da877c4
|
rename FunctionCallReqInput to ParseFunctionCallReq (#3976)
|
2025-02-28 18:46:25 -08:00 |
|
Chayenne
|
3f8a441437
|
Docs: Add redline to highlight main process (#3977)
|
2025-02-28 18:37:15 -08:00 |
|
Chayenne
|
aceb420179
|
Docs: add type hint to smapling parameters (#3975)
|
2025-02-28 18:21:20 -08:00 |
|
Baizhou Zhang
|
90a4b7d98a
|
[Feature]Support ragged prefill in flashinfer mla backend (#3967)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
|
2025-02-28 18:13:56 -08:00 |
|
Yineng Zhang
|
f3b99f73b3
|
update flashinfer-python version
|
2025-02-28 16:31:59 -08:00 |
|
Elfie Guo
|
9e74ee91da
|
Update cutlass dependency (#3966)
|
2025-02-28 16:16:31 -08:00 |
|
Chaitanya Sri Krishna Lolla
|
77a6c9d229
|
Remove unused imports from rocm mla kernel. (#3963)
|
2025-02-28 10:01:08 -08:00 |
|
fzyzcjy
|
e3e0bc50a9
|
[Feature] SPMD for SGLang + Verl (#3852)
|
2025-02-28 09:53:10 -08:00 |
|
mlmz
|
bac414ab53
|
[Feature] integrate Structural Tag in xgrammar backend for function calling (#3566)
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
|
2025-02-27 23:33:41 -08:00 |
|
Chang Su
|
eec3f6d1eb
|
[Bugfix] Fix tokenizer_manager not getting 400 when req is too long (#3678)
Co-authored-by: voidxb <unkown>
|
2025-02-27 22:59:43 -08:00 |
|
Chayenne
|
90bc26a813
|
set a strict sgl-kernel version (#3950)
|
2025-02-27 22:44:57 -08:00 |
|
Kebe
|
ec0a72c2d9
|
Fix bench_serving not recognizing OPENAI_API_KEY (#3870)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-02-27 20:18:53 -08:00 |
|
yiakwy-xpu-ml-framework-team
|
1c96fa86cf
|
[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613)
|
2025-02-27 19:42:48 -08:00 |
|
KCFindstr
|
bc20e93f2d
|
[feat] Add Vertex AI compatible prediction route for /generate (#3866)
|
2025-02-27 19:42:15 -08:00 |
|
Qiaolin Yu
|
d38878523d
|
Fix the doc link for sampling params (#3861)
|
2025-02-27 13:31:43 -08:00 |
|
Yineng Zhang
|
564bdf29f7
|
upgrade flashinfer v0.2.2.post1 (#3934)
|
2025-02-27 09:53:48 -08:00 |
|
Yineng Zhang
|
5d86016855
|
revert "Docs: Reorngaize dpsk links #3900" (#3933)
|
2025-02-27 08:57:13 -08:00 |
|
Enrique Shockwave
|
d281587989
|
Improve: Support xgrammar 0.1.14 (#3593)
|
2025-02-27 08:42:54 -08:00 |
|
laixin
|
b0df5d240b
|
Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
|
2025-02-27 10:59:46 +00:00 |
|
Baizhou Zhang
|
3e02526b1f
|
[Doc] Add experimental tag for flashinfer mla (#3925)
|
2025-02-27 01:55:36 -08:00 |
|
Stefan He
|
d8a98a2cad
|
[Docs] Improve DPSK docs in dark mode (#3914)
|
2025-02-27 00:13:04 -08:00 |
|
Qing
|
0519269d20
|
[Docs] Disable notebook CI when merge to main (#3905)
|
2025-02-26 22:13:33 -08:00 |
|
Qiaolin Yu
|
d6898dd253
|
Add return hidden state in the native API (#3897)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 22:06:54 -08:00 |
|
Baizhou Zhang
|
71ed01833d
|
[doc] Update document for flashinfer mla (#3907)
|
2025-02-26 20:40:45 -08:00 |
|
Tianxing Wu
|
8b681d7724
|
[Rocm] Fix to the rocm_mla_decode_rope.py returning random result (#3898)
|
2025-02-26 17:05:30 -08:00 |
|
ybyang
|
194eea1774
|
[doc] update sponsorship (#3903)
|
2025-02-26 16:28:15 -08:00 |
|
simveit
|
acd1a15921
|
Docs: Implemented frontend docs (#3791)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 15:30:05 -08:00 |
|
Chayenne
|
7c1692aa90
|
Docs: Reorngaize dpsk links (#3900)
|
2025-02-26 15:16:31 -08:00 |
|
Chayenne
|
8f019c7d1a
|
Docs: Move dpsk docs forward a step (#3894)
|
2025-02-26 11:43:20 -08:00 |
|
JC1DA
|
7551498a69
|
[Feature] Support llguidance for constrained decoding (#3298)
|
2025-02-26 10:41:49 -08:00 |
|
simveit
|
44a2c4bd56
|
Docs: improve link to docs (#3860)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 10:29:25 -08:00 |
|
simveit
|
c9fc4a9d26
|
Docs: delete sgl-kernel install in docs (#3845)
|
2025-02-26 09:25:43 -08:00 |
|
lukec
|
21463e321a
|
Expert Parallelism (EP) Support for DeepSeek V3/R1 (#3602)
Co-authored-by: laixin <xielx@shanghaitech.edu.cn>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: laixin <q865809639@gmail.com>
|
2025-02-26 02:29:37 -08:00 |
|
Shenggui Li
|
3dc9ff3ce8
|
[doc] fixed dpsk quant faq (#3865)
|
2025-02-25 19:40:47 -08:00 |
|
Shenggui Li
|
06427dfab1
|
[doc] added quantization doc for dpsk (#3843)
|
2025-02-25 09:43:28 -08:00 |
|
Kebe
|
60524920ba
|
[Bug]: Fix maximum recursion depth triggered on exception exit (#3519)
|
2025-02-25 09:39:38 -08:00 |
|
IAN
|
107710268a
|
[BugFix] Fix crash when receive a req with structed output in DP attention mode. (#3841)
|
2025-02-25 09:32:05 -08:00 |
|
who who who
|
4606e2a3fe
|
Bug: fix capture_bs (#3857)
|
2025-02-25 08:40:35 -08:00 |
|
Nicolas Castet
|
127998cc41
|
Fix allgather ops inside cuda graphs (#3709)
|
2025-02-25 08:39:10 -08:00 |
|
Shenggui Li
|
c0bb9eb3b3
|
[improve] made timeout configurable (#3803)
|
2025-02-25 00:26:08 -08:00 |
|
Yueyang Pan
|
7036d6fc67
|
[Bug]: Add missing clamp to llavavid (#3787)
|
2025-02-24 19:10:15 -08:00 |
|
Chaitanya Sri Krishna Lolla
|
6ce9dbe828
|
[ROCm] Enable Fused MLA Triton kernel for DeepSeekV3 (#3237)
Co-authored-by: HAI <hixiao@gmail.com>
|
2025-02-24 18:14:31 -08:00 |
|
Yuanheng Zhao
|
3758d209a0
|
[Doc] Fix typo in server-argument description (#3641)
|
2025-02-24 16:57:13 -08:00 |
|
Wilson Wu
|
faf29e0b23
|
Docs: fix doc site copyright to current year (#3741)
|
2025-02-24 16:56:04 -08:00 |
|
He1pa
|
b0743ea059
|
Docs: fix dead link in router.md (#3799)
|
2025-02-24 16:53:57 -08:00 |
|
Wang Ran (汪然)
|
60b771c815
|
Improve: fix typos (#3801)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-24 16:51:23 -08:00 |
|
Lianmin Zheng
|
d7934cde45
|
Fix CI and install docs (#3821)
|
2025-02-24 16:17:38 -08:00 |
|
Lianmin Zheng
|
62bbd34393
|
Revert "Extract generation_manager from tokenizer_manager" (#3829)
|
2025-02-24 14:49:16 -08:00 |
|