Chayenne
|
6b859e7ddd
|
Docs: add special warning to engine docs (#3979)
|
2025-02-28 21:59:20 -08:00 |
|
Chayenne
|
3f8a441437
|
Docs: Add redline to highlight main process (#3977)
|
2025-02-28 18:37:15 -08:00 |
|
Chayenne
|
aceb420179
|
Docs: add type hint to smapling parameters (#3975)
|
2025-02-28 18:21:20 -08:00 |
|
Baizhou Zhang
|
90a4b7d98a
|
[Feature]Support ragged prefill in flashinfer mla backend (#3967)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
|
2025-02-28 18:13:56 -08:00 |
|
Yineng Zhang
|
564bdf29f7
|
upgrade flashinfer v0.2.2.post1 (#3934)
|
2025-02-27 09:53:48 -08:00 |
|
Yineng Zhang
|
5d86016855
|
revert "Docs: Reorngaize dpsk links #3900" (#3933)
|
2025-02-27 08:57:13 -08:00 |
|
Baizhou Zhang
|
3e02526b1f
|
[Doc] Add experimental tag for flashinfer mla (#3925)
|
2025-02-27 01:55:36 -08:00 |
|
Stefan He
|
d8a98a2cad
|
[Docs] Improve DPSK docs in dark mode (#3914)
|
2025-02-27 00:13:04 -08:00 |
|
Qiaolin Yu
|
d6898dd253
|
Add return hidden state in the native API (#3897)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 22:06:54 -08:00 |
|
Baizhou Zhang
|
71ed01833d
|
[doc] Update document for flashinfer mla (#3907)
|
2025-02-26 20:40:45 -08:00 |
|
simveit
|
acd1a15921
|
Docs: Implemented frontend docs (#3791)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 15:30:05 -08:00 |
|
Chayenne
|
7c1692aa90
|
Docs: Reorngaize dpsk links (#3900)
|
2025-02-26 15:16:31 -08:00 |
|
Chayenne
|
8f019c7d1a
|
Docs: Move dpsk docs forward a step (#3894)
|
2025-02-26 11:43:20 -08:00 |
|
JC1DA
|
7551498a69
|
[Feature] Support llguidance for constrained decoding (#3298)
|
2025-02-26 10:41:49 -08:00 |
|
simveit
|
44a2c4bd56
|
Docs: improve link to docs (#3860)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-26 10:29:25 -08:00 |
|
simveit
|
c9fc4a9d26
|
Docs: delete sgl-kernel install in docs (#3845)
|
2025-02-26 09:25:43 -08:00 |
|
Shenggui Li
|
3dc9ff3ce8
|
[doc] fixed dpsk quant faq (#3865)
|
2025-02-25 19:40:47 -08:00 |
|
Shenggui Li
|
06427dfab1
|
[doc] added quantization doc for dpsk (#3843)
|
2025-02-25 09:43:28 -08:00 |
|
Shenggui Li
|
c0bb9eb3b3
|
[improve] made timeout configurable (#3803)
|
2025-02-25 00:26:08 -08:00 |
|
Yuanheng Zhao
|
3758d209a0
|
[Doc] Fix typo in server-argument description (#3641)
|
2025-02-24 16:57:13 -08:00 |
|
Wilson Wu
|
faf29e0b23
|
Docs: fix doc site copyright to current year (#3741)
|
2025-02-24 16:56:04 -08:00 |
|
He1pa
|
b0743ea059
|
Docs: fix dead link in router.md (#3799)
|
2025-02-24 16:53:57 -08:00 |
|
Lianmin Zheng
|
d7934cde45
|
Fix CI and install docs (#3821)
|
2025-02-24 16:17:38 -08:00 |
|
Lianmin Zheng
|
f2388f6b95
|
Revert "Rename TokenizerManager to StdOrchestrator" (#3828)
|
2025-02-24 14:47:59 -08:00 |
|
Baizhou Zhang
|
4d2a88bdff
|
[Docs]Add instruction for manually stopping nsys profiler (#3795)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-23 13:21:48 -08:00 |
|
fzyzcjy
|
45360b2fa9
|
Improve: Rename TokenizerManager to StdOrchestrator (#3116)
|
2025-02-23 00:30:58 -08:00 |
|
Mick
|
45205d88a0
|
bench: Add MMMU benchmark for vLM (#3562)
|
2025-02-22 08:10:59 -08:00 |
|
simveit
|
20b765a26e
|
Model: Support Qwen 72B RM model. (#3772)
|
2025-02-21 14:38:21 -08:00 |
|
Chayenne
|
e310722266
|
Docs: Update offline_engine_api and add links (#3773)
|
2025-02-21 14:15:52 -08:00 |
|
Shi Shuai
|
e074e76b31
|
docs: Add offline engine launch example and documentation (#3771)
|
2025-02-21 11:25:52 -08:00 |
|
simveit
|
4592afc27d
|
Docs: Fix layout to docs (#3733)
|
2025-02-21 11:24:13 -08:00 |
|
Shakhizat Nurgaliyev
|
d8d75d256a
|
Change description of nvidia jetson docs (#3761)
|
2025-02-21 20:44:22 +08:00 |
|
Shenggui Li
|
c6a4852136
|
[docs] added torch.compile cache to dpsk manual (#3737)
|
2025-02-21 00:11:40 -08:00 |
|
Baizhou Zhang
|
ac05310098
|
[Docs] Modify ep related server args and remove cublas part of deepseek (#3732)
|
2025-02-21 03:37:56 +08:00 |
|
Chayenne
|
3c7bfd7eab
|
Docs: Fix layout with sub-section (#3710)
|
2025-02-19 15:44:30 -08:00 |
|
Shi Shuai
|
55de40f782
|
[Docs]: Fix Multi-User Port Allocation Conflicts (#3601)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: simveit <simp.veitner@gmail.com>
|
2025-02-19 11:15:44 -08:00 |
|
Baizhou Zhang
|
67fc595bb8
|
[Feature] Apply Cublas Grouped Gemm kernel (#3629)
|
2025-02-18 15:18:31 +08:00 |
|
ybyang
|
c51dc2cc8d
|
Docs: Deploy multi-node inference (LWS method) using sglang in a K8s cluster (#3624)
|
2025-02-17 18:14:20 -08:00 |
|
Yineng Zhang
|
a5375adc3a
|
chore: bump v0.4.3.post2 (#3645)
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
|
2025-02-18 02:48:30 +08:00 |
|
Yineng Zhang
|
75d171a9c5
|
chore: update flashinfer v0.2.1.post2 (#3644)
|
2025-02-18 02:47:42 +08:00 |
|
Yineng Zhang
|
e782eb7e6a
|
chore: bump v0.4.3.post1 (#3638)
|
2025-02-17 21:58:19 +08:00 |
|
Shenggui Li
|
c9565e49e7
|
[docker] added rdma support (#3619)
|
2025-02-17 15:36:16 +08:00 |
|
Shi Shuai
|
d03c4c25a7
|
[docs] Update sampling_params.md (#3617)
|
2025-02-16 18:52:30 -08:00 |
|
simveit
|
8f13377dea
|
Draft of updated doc for sampling params. (#3260)
Co-authored-by: shuaills <shishuaicareer@gmail.com>
|
2025-02-16 14:28:22 -08:00 |
|
Mick
|
bcc213df61
|
Model: Support Qwen 2.5 vl (#3258)
|
2025-02-16 00:58:53 -08:00 |
|
Shenggui Li
|
231c40d859
|
[docs] added favicon to sphinx html (#3564)
|
2025-02-15 10:21:21 -08:00 |
|
Mick
|
7711ac6ed0
|
doc: emphasize and notify the usage of chat_template (#3589)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-15 00:10:32 -08:00 |
|
Shi Shuai
|
7443197a63
|
[CI] Improve Docs CI Efficiency (#3587)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-14 19:57:00 -08:00 |
|
Yineng Zhang
|
4e23c961e8
|
docs: update install (#3581)
|
2025-02-14 18:54:50 +08:00 |
|
Yineng Zhang
|
31eec35ba8
|
fix doc (#3558)
|
2025-02-14 10:11:31 +08:00 |
|