Commit Graph

  • 87f671ab58 Fix debug_tensor_dump_output_folder optional key missing (#4046) Qubitium-ModelCloud 2025-03-04 19:42:48 +08:00
  • 51d25405a7 ROCm: update aiter and its usage to fused moe (bloat16, fp8, fp8 block-quant) (#4053) HAI 2025-03-04 03:00:46 -08:00
  • e0a2c96308 Fix breakage problem when using custom_ar (#4052) kk 2025-03-04 18:59:03 +08:00
  • 12f2e6c3f1 Fix: #3988 using blockwise_int8 (#4023) Xihuai Wang 2025-03-04 15:49:58 +08:00
  • 95575aa76a Reasoning parser (#4000) Xihuai Wang 2025-03-04 13:16:36 +08:00
  • 11eea69e70 Fix assert options.num_stages != 0 error in the latest ROCm build image (#4049) kk 2025-03-04 12:37:03 +08:00
  • 1baa9e6cf9 docs: update README (#4044) Yineng Zhang 2025-03-03 17:09:18 -08:00
  • 911fcd0910 Update README.md (#4043) Lianmin Zheng 2025-03-03 16:29:46 -08:00
  • 9fafa62db7 Share target model embed and head weights for nextn (#4033) Ke Bao 2025-03-04 05:30:04 +08:00
  • 146ac8df07 Add examples in sampling parameters (#4039) Chayenne 2025-03-03 13:04:32 -08:00
  • 57a404fd55 Remove outdated test utils and fix links for the doc of sampling params (#3999) Qiaolin Yu 2025-03-03 12:41:38 -05:00
  • 2796fbb53d Docs: Fix sampling parameter (#4034) Chayenne 2025-03-03 09:32:36 -08:00
  • 935cda944b Misc clean up; Remove the support of jump forward (#4032) Lianmin Zheng 2025-03-03 07:02:14 -08:00
  • 110e006673 Reorganize python source files in sgl-kernel with multiple files (#4027) Lianmin Zheng 2025-03-03 06:36:40 -08:00
  • 6b45a21d16 Reorganize c++ source files in sgl-kernel with multiple folders (#4025) Lianmin Zheng 2025-03-03 05:32:30 -08:00
  • a7000a7650 Update metrics documentation (#3264) Yudi Xue 2025-03-03 05:03:58 -08:00
  • 1a8f995c46 remove cache configs in model definitions (#4031) Lianmin Zheng 2025-03-03 05:00:50 -08:00
  • a3ab768a2b Clean up custom allreduce (#4029) Lianmin Zheng 2025-03-03 04:59:53 -08:00
  • 66301e124f Improve code styles (#4021) Lianmin Zheng 2025-03-03 03:20:23 -08:00
  • ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988) Lianmin Zheng 2025-03-03 00:12:04 -08:00
  • 0194948fd9 Optimize Triton Kernel of Group GEMM in DeepGEMM Benchmark (#4014) Stefan He 2025-03-02 23:29:55 -08:00
  • b4d34cd35d Fix nightly-test CI (#3826) yinfan98 2025-03-03 15:14:45 +08:00
  • 728e175fc4 Add examples to token-in-token-out for LLM (#4010) Chayenne 2025-03-02 21:03:49 -08:00
  • 9e1014cf99 Revert "Add fast decode plan for flashinfer mla" (#4008) Lianmin Zheng 2025-03-02 19:29:10 -08:00
  • fa56106731 Add fast decode plan for flashinfer mla (#3987) Baizhou Zhang 2025-03-02 19:16:37 -08:00
  • 7fbab730bd [feat] add small vocab table for eagle's draft model[1]. (#3822) Zhousx 2025-03-03 10:58:45 +08:00
  • b7e274f2d9 Add Benchmark for DeepGEMM Group GEMM (#3993) Stefan He 2025-03-02 17:47:21 -08:00
  • 9cf4077294 Enable custom AR for AMD GPUs and maintain it in sgl-kernel (#3406) Hubert Lu 2025-03-02 15:19:06 -08:00
  • d3fe9bae56 Add accuracy test for TP torch compile (#3994) Ke Bao 2025-03-03 05:18:18 +08:00
  • 00ce7e311c Fix all gather torch compile (#3992) Ke Bao 2025-03-02 16:41:38 +08:00
  • 50f28f65a0 fix typo in deep gemm benchmarking(#3991) Xiaoyu Zhang 2025-03-02 16:34:00 +08:00
  • 90a55e2566 add deepgemm and sglang fp8 block-wise gemm benchmark (#3893) Xiaoyu Zhang 2025-03-02 15:01:58 +08:00
  • 407e2b923d Update CODEOWNERS (#3989) Lianmin Zheng 2025-03-01 21:47:30 -08:00
  • 40782f05d7 Refactor: Move return_hidden_states to the generate input (#3985) Qiaolin Yu 2025-03-01 20:51:29 -05:00
  • 18bb216c28 Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982) Chayenne 2025-02-28 23:57:17 -08:00
  • 6b859e7ddd Docs: add special warning to engine docs (#3979) Chayenne 2025-02-28 21:59:20 -08:00
  • 930da877c4 rename FunctionCallReqInput to ParseFunctionCallReq (#3976) Chayenne 2025-02-28 18:46:25 -08:00
  • 3f8a441437 Docs: Add redline to highlight main process (#3977) Chayenne 2025-02-28 18:37:15 -08:00
  • aceb420179 Docs: add type hint to smapling parameters (#3975) Chayenne 2025-02-28 18:21:20 -08:00
  • 90a4b7d98a [Feature]Support ragged prefill in flashinfer mla backend (#3967) Baizhou Zhang 2025-02-28 18:13:56 -08:00
  • f3b99f73b3 update flashinfer-python version Yineng Zhang 2025-02-28 16:31:59 -08:00
  • 9e74ee91da Update cutlass dependency (#3966) Elfie Guo 2025-02-28 16:16:31 -08:00
  • 77a6c9d229 Remove unused imports from rocm mla kernel. (#3963) Chaitanya Sri Krishna Lolla 2025-02-28 23:31:08 +05:30
  • e3e0bc50a9 [Feature] SPMD for SGLang + Verl (#3852) fzyzcjy 2025-03-01 01:53:10 +08:00
  • bac414ab53 [Feature] integrate Structural Tag in xgrammar backend for function calling (#3566) mlmz 2025-02-28 15:33:41 +08:00
  • eec3f6d1eb [Bugfix] Fix tokenizer_manager not getting 400 when req is too long (#3678) Chang Su 2025-02-27 22:59:43 -08:00
  • 90bc26a813 set a strict sgl-kernel version (#3950) Chayenne 2025-02-27 22:44:57 -08:00
  • ec0a72c2d9 Fix bench_serving not recognizing OPENAI_API_KEY (#3870) Kebe 2025-02-28 12:18:53 +08:00
  • 1c96fa86cf [MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613) yiakwy-xpu-ml-framework-team 2025-02-28 11:42:48 +08:00
  • bc20e93f2d [feat] Add Vertex AI compatible prediction route for /generate (#3866) KCFindstr 2025-02-27 19:42:15 -08:00
  • d38878523d Fix the doc link for sampling params (#3861) Qiaolin Yu 2025-02-27 16:31:43 -05:00
  • 564bdf29f7 upgrade flashinfer v0.2.2.post1 (#3934) Yineng Zhang 2025-02-27 09:53:48 -08:00
  • 5d86016855 revert "Docs: Reorngaize dpsk links #3900" (#3933) Yineng Zhang 2025-02-27 08:57:13 -08:00
  • d281587989 Improve: Support xgrammar 0.1.14 (#3593) Enrique Shockwave 2025-02-27 16:42:54 +00:00
  • b0df5d240b Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922) laixin 2025-02-27 18:59:46 +08:00
  • 3e02526b1f [Doc] Add experimental tag for flashinfer mla (#3925) Baizhou Zhang 2025-02-27 01:55:36 -08:00
  • d8a98a2cad [Docs] Improve DPSK docs in dark mode (#3914) Stefan He 2025-02-27 00:13:04 -08:00
  • 0519269d20 [Docs] Disable notebook CI when merge to main (#3905) Qing 2025-02-26 22:13:33 -08:00
  • d6898dd253 Add return hidden state in the native API (#3897) Qiaolin Yu 2025-02-27 01:06:54 -05:00
  • 71ed01833d [doc] Update document for flashinfer mla (#3907) Baizhou Zhang 2025-02-26 20:40:45 -08:00
  • 8b681d7724 [Rocm] Fix to the rocm_mla_decode_rope.py returning random result (#3898) Tianxing Wu 2025-02-27 03:05:30 +02:00
  • 194eea1774 [doc] update sponsorship (#3903) ybyang 2025-02-27 08:28:15 +08:00
  • acd1a15921 Docs: Implemented frontend docs (#3791) simveit 2025-02-27 00:30:05 +01:00
  • 7c1692aa90 Docs: Reorngaize dpsk links (#3900) Chayenne 2025-02-26 15:16:31 -08:00
  • 8f019c7d1a Docs: Move dpsk docs forward a step (#3894) Chayenne 2025-02-26 11:43:20 -08:00
  • 7551498a69 [Feature] Support llguidance for constrained decoding (#3298) JC1DA 2025-02-26 10:41:49 -08:00
  • 44a2c4bd56 Docs: improve link to docs (#3860) simveit 2025-02-26 19:29:25 +01:00
  • c9fc4a9d26 Docs: delete sgl-kernel install in docs (#3845) simveit 2025-02-26 18:25:43 +01:00
  • 21463e321a Expert Parallelism (EP) Support for DeepSeek V3/R1 (#3602) lukec 2025-02-26 18:29:37 +08:00
  • 3dc9ff3ce8 [doc] fixed dpsk quant faq (#3865) Shenggui Li 2025-02-26 11:40:47 +08:00
  • 06427dfab1 [doc] added quantization doc for dpsk (#3843) Shenggui Li 2025-02-26 01:43:28 +08:00
  • 60524920ba [Bug]: Fix maximum recursion depth triggered on exception exit (#3519) Kebe 2025-02-26 01:39:38 +08:00
  • 107710268a [BugFix] Fix crash when receive a req with structed output in DP attention mode. (#3841) IAN 2025-02-26 01:32:05 +08:00
  • 4606e2a3fe Bug: fix capture_bs (#3857) who who who 2025-02-26 00:40:35 +08:00
  • 127998cc41 Fix allgather ops inside cuda graphs (#3709) Nicolas Castet 2025-02-25 10:39:10 -06:00
  • c0bb9eb3b3 [improve] made timeout configurable (#3803) Shenggui Li 2025-02-25 16:26:08 +08:00
  • 7036d6fc67 [Bug]: Add missing clamp to llavavid (#3787) Yueyang Pan 2025-02-25 04:10:15 +01:00
  • 6ce9dbe828 [ROCm] Enable Fused MLA Triton kernel for DeepSeekV3 (#3237) Chaitanya Sri Krishna Lolla 2025-02-25 07:44:31 +05:30
  • 3758d209a0 [Doc] Fix typo in server-argument description (#3641) Yuanheng Zhao 2025-02-25 08:57:13 +08:00
  • faf29e0b23 Docs: fix doc site copyright to current year (#3741) Wilson Wu 2025-02-25 08:56:04 +08:00
  • b0743ea059 Docs: fix dead link in router.md (#3799) He1pa 2025-02-25 08:53:57 +08:00
  • 60b771c815 Improve: fix typos (#3801) Wang Ran (汪然) 2025-02-25 08:51:23 +08:00
  • d7934cde45 Fix CI and install docs (#3821) Lianmin Zheng 2025-02-24 16:17:38 -08:00
  • 62bbd34393 Revert "Extract generation_manager from tokenizer_manager" (#3829) Lianmin Zheng 2025-02-24 14:49:16 -08:00
  • f2388f6b95 Revert "Rename TokenizerManager to StdOrchestrator" (#3828) Lianmin Zheng 2025-02-24 14:47:59 -08:00
  • c9745ee082 Fix pandas dependency in CI (#3818) Lianmin Zheng 2025-02-24 05:56:57 -08:00
  • 1a6e97577a Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3730) laixin 2025-02-24 21:43:35 +08:00
  • b110084654 Refactor flashinfer logic for deepseek v3 and fix accuracy bug (#3785) Baizhou Zhang 2025-02-24 04:07:25 -08:00
  • 27a46317b6 Fix dependency (#3813) Lianmin Zheng 2025-02-24 03:50:58 -08:00
  • c979580817 Update readme (#3809) Lianmin Zheng 2025-02-24 00:31:08 -08:00
  • 6c7a152c5a Hierarchical Caching for SGLang (#2693) Zhiqiang Xie 2025-02-23 21:56:30 -08:00
  • 4d2a88bdff [Docs]Add instruction for manually stopping nsys profiler (#3795) Baizhou Zhang 2025-02-23 13:21:48 -08:00
  • 45360b2fa9 Improve: Rename TokenizerManager to StdOrchestrator (#3116) fzyzcjy 2025-02-23 16:30:58 +08:00
  • 3f41b18455 Improve: Extract generation_manager from tokenizer_manager (#3115) fzyzcjy 2025-02-23 15:25:45 +08:00
  • 45205d88a0 bench: Add MMMU benchmark for vLM (#3562) Mick 2025-02-23 00:10:59 +08:00
  • 9087694006 Improve: Use TypeBasedDispatcher in DetokenizerManager (#3117) fzyzcjy 2025-02-22 11:50:46 +08:00
  • a3339d8cac Bug: Fix weight loader error when LM head weights are tied (#3766) fzyzcjy 2025-02-22 09:53:12 +08:00
  • 14d90617b0 Bug: fix lm head weights in Qwen models (#3777) Chayenne 2025-02-21 16:49:31 -08:00
  • d37f95511d Improve: Tiny fix Olmo2 (#3348) fzyzcjy 2025-02-22 08:09:35 +08:00
  • c66b2c9cf1 Add support for nvidia modelopt fp8 kv cache (#3223) Zhiyu 2025-02-21 15:04:58 -08:00