Yineng Zhang
|
564bdf29f7
|
upgrade flashinfer v0.2.2.post1 (#3934)
|
2025-02-27 09:53:48 -08:00 |
|
Yineng Zhang
|
5d86016855
|
revert "Docs: Reorngaize dpsk links #3900" (#3933)
|
2025-02-27 08:57:13 -08:00 |
|
Enrique Shockwave
|
d281587989
|
Improve: Support xgrammar 0.1.14 (#3593)
|
2025-02-27 08:42:54 -08:00 |
|
laixin
|
b0df5d240b
|
Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
|
2025-02-27 10:59:46 +00:00 |
|
Baizhou Zhang
|
3e02526b1f
|
[Doc] Add experimental tag for flashinfer mla (#3925)
|
2025-02-27 01:55:36 -08:00 |
|
Stefan He
|
d8a98a2cad
|
[Docs] Improve DPSK docs in dark mode (#3914)
|
2025-02-27 00:13:04 -08:00 |
|
Qing
|
0519269d20
|
[Docs] Disable notebook CI when merge to main (#3905)
|
2025-02-26 22:13:33 -08:00 |
|
Qiaolin Yu
|
d6898dd253
|
Add return hidden state in the native API (#3897)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 22:06:54 -08:00 |
|
Baizhou Zhang
|
71ed01833d
|
[doc] Update document for flashinfer mla (#3907)
|
2025-02-26 20:40:45 -08:00 |
|
Tianxing Wu
|
8b681d7724
|
[Rocm] Fix to the rocm_mla_decode_rope.py returning random result (#3898)
|
2025-02-26 17:05:30 -08:00 |
|
ybyang
|
194eea1774
|
[doc] update sponsorship (#3903)
|
2025-02-26 16:28:15 -08:00 |
|
simveit
|
acd1a15921
|
Docs: Implemented frontend docs (#3791)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 15:30:05 -08:00 |
|
Chayenne
|
7c1692aa90
|
Docs: Reorngaize dpsk links (#3900)
|
2025-02-26 15:16:31 -08:00 |
|
Chayenne
|
8f019c7d1a
|
Docs: Move dpsk docs forward a step (#3894)
|
2025-02-26 11:43:20 -08:00 |
|
JC1DA
|
7551498a69
|
[Feature] Support llguidance for constrained decoding (#3298)
|
2025-02-26 10:41:49 -08:00 |
|
simveit
|
44a2c4bd56
|
Docs: improve link to docs (#3860)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 10:29:25 -08:00 |
|
simveit
|
c9fc4a9d26
|
Docs: delete sgl-kernel install in docs (#3845)
|
2025-02-26 09:25:43 -08:00 |
|
lukec
|
21463e321a
|
Expert Parallelism (EP) Support for DeepSeek V3/R1 (#3602)
Co-authored-by: laixin <xielx@shanghaitech.edu.cn>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: laixin <q865809639@gmail.com>
|
2025-02-26 02:29:37 -08:00 |
|
Shenggui Li
|
3dc9ff3ce8
|
[doc] fixed dpsk quant faq (#3865)
|
2025-02-25 19:40:47 -08:00 |
|
Shenggui Li
|
06427dfab1
|
[doc] added quantization doc for dpsk (#3843)
|
2025-02-25 09:43:28 -08:00 |
|
Kebe
|
60524920ba
|
[Bug]: Fix maximum recursion depth triggered on exception exit (#3519)
|
2025-02-25 09:39:38 -08:00 |
|
IAN
|
107710268a
|
[BugFix] Fix crash when receive a req with structed output in DP attention mode. (#3841)
|
2025-02-25 09:32:05 -08:00 |
|
who who who
|
4606e2a3fe
|
Bug: fix capture_bs (#3857)
|
2025-02-25 08:40:35 -08:00 |
|
Nicolas Castet
|
127998cc41
|
Fix allgather ops inside cuda graphs (#3709)
|
2025-02-25 08:39:10 -08:00 |
|
Shenggui Li
|
c0bb9eb3b3
|
[improve] made timeout configurable (#3803)
|
2025-02-25 00:26:08 -08:00 |
|
Yueyang Pan
|
7036d6fc67
|
[Bug]: Add missing clamp to llavavid (#3787)
|
2025-02-24 19:10:15 -08:00 |
|
Chaitanya Sri Krishna Lolla
|
6ce9dbe828
|
[ROCm] Enable Fused MLA Triton kernel for DeepSeekV3 (#3237)
Co-authored-by: HAI <hixiao@gmail.com>
|
2025-02-24 18:14:31 -08:00 |
|
Yuanheng Zhao
|
3758d209a0
|
[Doc] Fix typo in server-argument description (#3641)
|
2025-02-24 16:57:13 -08:00 |
|
Wilson Wu
|
faf29e0b23
|
Docs: fix doc site copyright to current year (#3741)
|
2025-02-24 16:56:04 -08:00 |
|
He1pa
|
b0743ea059
|
Docs: fix dead link in router.md (#3799)
|
2025-02-24 16:53:57 -08:00 |
|
Wang Ran (汪然)
|
60b771c815
|
Improve: fix typos (#3801)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-24 16:51:23 -08:00 |
|
Lianmin Zheng
|
d7934cde45
|
Fix CI and install docs (#3821)
|
2025-02-24 16:17:38 -08:00 |
|
Lianmin Zheng
|
62bbd34393
|
Revert "Extract generation_manager from tokenizer_manager" (#3829)
|
2025-02-24 14:49:16 -08:00 |
|
Lianmin Zheng
|
f2388f6b95
|
Revert "Rename TokenizerManager to StdOrchestrator" (#3828)
|
2025-02-24 14:47:59 -08:00 |
|
Lianmin Zheng
|
c9745ee082
|
Fix pandas dependency in CI (#3818)
|
2025-02-24 05:56:57 -08:00 |
|
laixin
|
1a6e97577a
|
Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3730)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2025-02-24 05:43:35 -08:00 |
|
Baizhou Zhang
|
b110084654
|
Refactor flashinfer logic for deepseek v3 and fix accuracy bug (#3785)
|
2025-02-24 04:07:25 -08:00 |
|
Lianmin Zheng
|
27a46317b6
|
Fix dependency (#3813)
|
2025-02-24 03:50:58 -08:00 |
|
Lianmin Zheng
|
c979580817
|
Update readme (#3809)
|
2025-02-24 00:31:08 -08:00 |
|
Zhiqiang Xie
|
6c7a152c5a
|
Hierarchical Caching for SGLang (#2693)
Co-authored-by: Wenxuan Tan <wenxuan.tan@wisc.edu>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-02-23 21:56:30 -08:00 |
|
Baizhou Zhang
|
4d2a88bdff
|
[Docs]Add instruction for manually stopping nsys profiler (#3795)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-23 13:21:48 -08:00 |
|
fzyzcjy
|
45360b2fa9
|
Improve: Rename TokenizerManager to StdOrchestrator (#3116)
|
2025-02-23 00:30:58 -08:00 |
|
fzyzcjy
|
3f41b18455
|
Improve: Extract generation_manager from tokenizer_manager (#3115)
|
2025-02-22 23:25:45 -08:00 |
|
Mick
|
45205d88a0
|
bench: Add MMMU benchmark for vLM (#3562)
|
2025-02-22 08:10:59 -08:00 |
|
fzyzcjy
|
9087694006
|
Improve: Use TypeBasedDispatcher in DetokenizerManager (#3117)
|
2025-02-21 19:50:46 -08:00 |
|
fzyzcjy
|
a3339d8cac
|
Bug: Fix weight loader error when LM head weights are tied (#3766)
|
2025-02-21 17:53:12 -08:00 |
|
Chayenne
|
14d90617b0
|
Bug: fix lm head weights in Qwen models (#3777)
|
2025-02-21 16:49:31 -08:00 |
|
fzyzcjy
|
d37f95511d
|
Improve: Tiny fix Olmo2 (#3348)
|
2025-02-21 16:09:35 -08:00 |
|
Zhiyu
|
c66b2c9cf1
|
Add support for nvidia modelopt fp8 kv cache (#3223)
|
2025-02-22 07:04:58 +08:00 |
|
simveit
|
20b765a26e
|
Model: Support Qwen 72B RM model. (#3772)
|
2025-02-21 14:38:21 -08:00 |
|