Commit Graph

2168 Commits

Author SHA1 Message Date
Yineng Zhang
564bdf29f7 upgrade flashinfer v0.2.2.post1 (#3934) 2025-02-27 09:53:48 -08:00
Yineng Zhang
5d86016855 revert "Docs: Reorngaize dpsk links #3900" (#3933) 2025-02-27 08:57:13 -08:00
Enrique Shockwave
d281587989 Improve: Support xgrammar 0.1.14 (#3593) 2025-02-27 08:42:54 -08:00
laixin
b0df5d240b Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
2025-02-27 10:59:46 +00:00
Baizhou Zhang
3e02526b1f [Doc] Add experimental tag for flashinfer mla (#3925) 2025-02-27 01:55:36 -08:00
Stefan He
d8a98a2cad [Docs] Improve DPSK docs in dark mode (#3914) 2025-02-27 00:13:04 -08:00
Qing
0519269d20 [Docs] Disable notebook CI when merge to main (#3905) 2025-02-26 22:13:33 -08:00
Qiaolin Yu
d6898dd253 Add return hidden state in the native API (#3897)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 22:06:54 -08:00
Baizhou Zhang
71ed01833d [doc] Update document for flashinfer mla (#3907) 2025-02-26 20:40:45 -08:00
Tianxing Wu
8b681d7724 [Rocm] Fix to the rocm_mla_decode_rope.py returning random result (#3898) 2025-02-26 17:05:30 -08:00
ybyang
194eea1774 [doc] update sponsorship (#3903) 2025-02-26 16:28:15 -08:00
simveit
acd1a15921 Docs: Implemented frontend docs (#3791)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 15:30:05 -08:00
Chayenne
7c1692aa90 Docs: Reorngaize dpsk links (#3900) 2025-02-26 15:16:31 -08:00
Chayenne
8f019c7d1a Docs: Move dpsk docs forward a step (#3894) 2025-02-26 11:43:20 -08:00
JC1DA
7551498a69 [Feature] Support llguidance for constrained decoding (#3298) 2025-02-26 10:41:49 -08:00
simveit
44a2c4bd56 Docs: improve link to docs (#3860)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 10:29:25 -08:00
simveit
c9fc4a9d26 Docs: delete sgl-kernel install in docs (#3845) 2025-02-26 09:25:43 -08:00
lukec
21463e321a Expert Parallelism (EP) Support for DeepSeek V3/R1 (#3602)
Co-authored-by: laixin <xielx@shanghaitech.edu.cn>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: laixin <q865809639@gmail.com>
2025-02-26 02:29:37 -08:00
Shenggui Li
3dc9ff3ce8 [doc] fixed dpsk quant faq (#3865) 2025-02-25 19:40:47 -08:00
Shenggui Li
06427dfab1 [doc] added quantization doc for dpsk (#3843) 2025-02-25 09:43:28 -08:00
Kebe
60524920ba [Bug]: Fix maximum recursion depth triggered on exception exit (#3519) 2025-02-25 09:39:38 -08:00
IAN
107710268a [BugFix] Fix crash when receive a req with structed output in DP attention mode. (#3841) 2025-02-25 09:32:05 -08:00
who who who
4606e2a3fe Bug: fix capture_bs (#3857) 2025-02-25 08:40:35 -08:00
Nicolas Castet
127998cc41 Fix allgather ops inside cuda graphs (#3709) 2025-02-25 08:39:10 -08:00
Shenggui Li
c0bb9eb3b3 [improve] made timeout configurable (#3803) 2025-02-25 00:26:08 -08:00
Yueyang Pan
7036d6fc67 [Bug]: Add missing clamp to llavavid (#3787) 2025-02-24 19:10:15 -08:00
Chaitanya Sri Krishna Lolla
6ce9dbe828 [ROCm] Enable Fused MLA Triton kernel for DeepSeekV3 (#3237)
Co-authored-by: HAI <hixiao@gmail.com>
2025-02-24 18:14:31 -08:00
Yuanheng Zhao
3758d209a0 [Doc] Fix typo in server-argument description (#3641) 2025-02-24 16:57:13 -08:00
Wilson Wu
faf29e0b23 Docs: fix doc site copyright to current year (#3741) 2025-02-24 16:56:04 -08:00
He1pa
b0743ea059 Docs: fix dead link in router.md (#3799) 2025-02-24 16:53:57 -08:00
Wang Ran (汪然)
60b771c815 Improve: fix typos (#3801)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-24 16:51:23 -08:00
Lianmin Zheng
d7934cde45 Fix CI and install docs (#3821) 2025-02-24 16:17:38 -08:00
Lianmin Zheng
62bbd34393 Revert "Extract generation_manager from tokenizer_manager" (#3829) 2025-02-24 14:49:16 -08:00
Lianmin Zheng
f2388f6b95 Revert "Rename TokenizerManager to StdOrchestrator" (#3828) 2025-02-24 14:47:59 -08:00
Lianmin Zheng
c9745ee082 Fix pandas dependency in CI (#3818) 2025-02-24 05:56:57 -08:00
laixin
1a6e97577a Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3730)
Co-authored-by: HandH1998 <1335248067@qq.com>
2025-02-24 05:43:35 -08:00
Baizhou Zhang
b110084654 Refactor flashinfer logic for deepseek v3 and fix accuracy bug (#3785) 2025-02-24 04:07:25 -08:00
Lianmin Zheng
27a46317b6 Fix dependency (#3813) 2025-02-24 03:50:58 -08:00
Lianmin Zheng
c979580817 Update readme (#3809) 2025-02-24 00:31:08 -08:00
Zhiqiang Xie
6c7a152c5a Hierarchical Caching for SGLang (#2693)
Co-authored-by: Wenxuan Tan <wenxuan.tan@wisc.edu>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-02-23 21:56:30 -08:00
Baizhou Zhang
4d2a88bdff [Docs]Add instruction for manually stopping nsys profiler (#3795)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-23 13:21:48 -08:00
fzyzcjy
45360b2fa9 Improve: Rename TokenizerManager to StdOrchestrator (#3116) 2025-02-23 00:30:58 -08:00
fzyzcjy
3f41b18455 Improve: Extract generation_manager from tokenizer_manager (#3115) 2025-02-22 23:25:45 -08:00
Mick
45205d88a0 bench: Add MMMU benchmark for vLM (#3562) 2025-02-22 08:10:59 -08:00
fzyzcjy
9087694006 Improve: Use TypeBasedDispatcher in DetokenizerManager (#3117) 2025-02-21 19:50:46 -08:00
fzyzcjy
a3339d8cac Bug: Fix weight loader error when LM head weights are tied (#3766) 2025-02-21 17:53:12 -08:00
Chayenne
14d90617b0 Bug: fix lm head weights in Qwen models (#3777) 2025-02-21 16:49:31 -08:00
fzyzcjy
d37f95511d Improve: Tiny fix Olmo2 (#3348) 2025-02-21 16:09:35 -08:00
Zhiyu
c66b2c9cf1 Add support for nvidia modelopt fp8 kv cache (#3223) 2025-02-22 07:04:58 +08:00
simveit
20b765a26e Model: Support Qwen 72B RM model. (#3772) 2025-02-21 14:38:21 -08:00