Commit Graph

1580 Commits

Author SHA1 Message Date
Adarsh Shirawalmath
fd34f2da35 [Docs] Add EBNF to sampling params docs (#2609) 2024-12-29 00:05:00 -08:00
Tanjiro
8ee9a8501a [Feature] Function Calling (#2544)
Co-authored-by: Haoyu Wang <120358163+HaoyuWang4188@users.noreply.github.com>
2024-12-28 21:58:52 -08:00
fzyzcjy
fd28640dc5 Add update_weights_from_tensor (#2631) 2024-12-28 13:30:27 -08:00
Yineng Zhang
7863e4368a add configs for block fp8 related kernels (#2628)
Co-authored-by: HandH1998 <1335248067@qq.com>
2024-12-28 23:12:04 +08:00
Shi Shuai
333e3bfde5 [docs]Refactor constrained decoding tutorial (#2633) 2024-12-28 07:00:38 -08:00
Shi Shuai
239c9d4d3a Docs: Add constrained decoding tutorial (#2614)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2024-12-27 23:54:28 -08:00
Lianmin Zheng
855d0ba381 [CI] Fix nightly test and raise better error message (#2626)
Co-authored-by: Sangbin <rkooo567@gmail.com>
2024-12-27 22:16:39 -08:00
Xiaoyu Zhang
9254a33ad4 avoid fused_moe_triton padding circular import (#2624) 2024-12-28 14:01:35 +08:00
Ke Bao
8a2681e26a Update readme (#2625) 2024-12-28 13:39:56 +08:00
Lianmin Zheng
5276a675f5 Add more supporting organizations (#2623) 2024-12-27 13:41:41 -08:00
Lianmin Zheng
751e5ca273 [minor] clean up docs and eos id (#2622) 2024-12-27 11:23:46 -08:00
Yang Zheng
7a7ac6bea1 [FIX] Update EOS from config (#2475) 2024-12-27 10:59:56 -08:00
Yineng Zhang
d9e6ee382b docs: update README (#2618) 2024-12-28 00:21:53 +08:00
Yineng Zhang
ef5b0ff90b chore: bump v0.4.1.post1 (#2616) 2024-12-28 00:11:06 +08:00
HandH1998
6e5305158c update sgl_moe_align_block_size usage (#2617) 2024-12-28 00:01:13 +08:00
HandH1998
77d1210b36 fix moe_align_block_size (#2615) 2024-12-27 23:32:53 +08:00
kk
70dc2fbe2d Change extend attention kernel launch parameter for ROCm platform to … (#2610)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
2024-12-27 00:32:17 -08:00
kk
b438a2e512 Fix triton kernel performance regression (#2611)
Co-authored-by: wunhuang <wunhuang@amd.com>
2024-12-27 15:54:38 +08:00
kk
7ca751ff7d Fused moe triton cfg opt for rocm (#2612)
Co-authored-by: wunhuang <wunhuang@amd.com>
2024-12-26 23:38:22 -08:00
Lianmin Zheng
c75adfec59 Update CODEOWNERS (#2608) 2024-12-26 20:58:08 -08:00
HAI
7722c11c1d Regression fix to AMD/ROCm from recent change (#2606) 2024-12-26 20:22:14 -08:00
fzyzcjy
b2ed5c8ea7 Tiny code cleanup in tokenizer_manager.py (#2586) 2024-12-26 17:53:09 -08:00
Lianmin Zheng
f46f394f4d Update README.md (#2605) 2024-12-26 10:58:49 -08:00
Lianmin Zheng
2125898af5 Update contributor_guide.md (#2603) 2024-12-26 08:36:13 -08:00
fzyzcjy
44f011d224 Super tiny typo fix (#2564) 2024-12-26 08:28:01 -08:00
kzhou003
ed91e003bb [UTILS] improve makefile a bit by adding help info (#2570)
Co-authored-by: Hongpeng Guo <hpguo@anyscale.com>
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Co-authored-by: yigex <yigex@amd.com>
2024-12-26 08:24:18 -08:00
yudian0504
531d6ea968 fix: package data missing (#2521) 2024-12-26 08:16:48 -08:00
Lianmin Zheng
dc3bee4815 Fix test and benchmark scripts (#2598) 2024-12-26 07:56:26 -08:00
Zhizhou Sha
a74d194146 [unittest] add unit test to test quant args of srt engine (#2574) 2024-12-26 06:54:43 -08:00
fzyzcjy
3169e66c23 Fix duplicated handling of GetWeightsByNameReqInput (#2565) 2024-12-26 06:49:32 -08:00
Lianmin Zheng
773951548d Fix logprob_start_len for multi modal models (#2597)
Co-authored-by: libra <lihu723@gmail.com>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: Wang, Haoyu <haoyu.wang@intel.com>
2024-12-26 06:27:45 -08:00
fsygd
637de9e8ce update readme of DeepSeek V3 (#2596) 2024-12-26 21:31:56 +08:00
Adarsh Shirawalmath
acb340728c [Feature] Support new parameter - EBNF in xgrammar (#2526) 2024-12-26 05:12:41 -08:00
Sangchun Ha (Patrick)
08effbff35 Error occurs when loading the gemma model in bitsandbytes format. (#2557) 2024-12-26 05:10:37 -08:00
Lianmin Zheng
60bd32723a Update README.md (#2594) 2024-12-26 03:31:50 -08:00
Liangsheng Yin
e7ebecf82e Fix cache hit rate when chunked prefill (#2555) 2024-12-26 03:14:28 -08:00
Xiaoyu Zhang
9a23c48456 h100 tuning fused_moe_triton for qwen2 moe (#2560) 2024-12-26 03:13:31 -08:00
Yineng Zhang
635a042623 docs: update deepseek v3 example (#2592) 2024-12-26 17:43:37 +08:00
Yineng Zhang
2dccecf432 fix: only enable moe_align_block_size for now (#2590) 2024-12-26 16:56:59 +08:00
Yineng Zhang
75ad0a143f docs: add deepseek v3 launch instructions (#2589) 2024-12-25 23:26:54 -08:00
Yineng Zhang
efc52f85e2 chore: bump v0.4.1 (#2582) 2024-12-26 07:14:51 +08:00
Yineng Zhang
60e2fdcf4f use sgl-kernel moe_align_block_size (#2581)
Co-authored-by: ispobock <ispobaoke@163.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
2024-12-26 06:29:08 +08:00
Yineng Zhang
d7c0e872b0 chore: bump 0.0.2.post8 for sgl-kernel (#2580) 2024-12-26 06:11:39 +08:00
Yineng Zhang
31548116a8 fix moe_align_block_size_kernel for shared memory issue (#2579)
Co-authored-by: ispobock <ispobaoke@163.com>
2024-12-26 05:31:04 +08:00
HandH1998
53aed988cb Refactor MoE (#2575)
Co-authored-by: zhyncs <me@zhyncs.com>
2024-12-26 00:02:14 +08:00
Ying Sheng
8a56b43175 [Bench] Flush cache before benchmarking (#2566) 2024-12-24 11:21:21 +08:00
Ke Bao
e835a50021 Reorg moe code (#2563) 2024-12-24 01:10:22 +08:00
Lianmin Zheng
23e5e50fd5 Fix gemlite import (#2553) 2024-12-22 20:21:17 -08:00
Shi Shuai
25e5d589e3 Doc: Update Grammar Backend (#2545)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2024-12-22 17:14:40 -08:00
Lianmin Zheng
41b1db69b8 A better aio rwlock that guarantees the order (#2547) 2024-12-22 15:44:32 -08:00