Commit Graph

338 Commits

Author SHA1 Message Date
libra
bdb3929dbb Refactor SchedulePolicy to improve code organization (#2571) 2025-01-04 00:05:16 +08:00
Lianmin Zheng
0f9cc6d8d3 Fix package loss for small models (#2717)
Co-authored-by: sdli1995 < mmlmonkey@163.com>
2025-01-02 18:25:26 -08:00
Shi Shuai
dd2e2d275f Docs: Update documentation workflow and contribution guide (#2704)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-01-02 09:18:31 -08:00
yukavio
815dce0554 Eagle speculative decoding part 4: Add EAGLE2 worker (#2150)
Co-authored-by: kavioyu <kavioyu@tencent.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-01-02 03:22:34 -08:00
fzyzcjy
9183c23eca Speed up update_weights_from_tensor (#2695) 2025-01-02 02:05:19 -08:00
Xiaotong Jiang
a4d6d6f1dd [feat]: Add math eval to CI nightly run (#2663)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-01-01 15:29:35 -08:00
Lianmin Zheng
21ec66e59e Minor follow-up fixes for the logprob refactor (#2670) 2024-12-30 05:42:08 -08:00
Lianmin Zheng
9c6ba2484f Refactor logprob computation to return the real logprob used in sampling (#2664) 2024-12-30 04:51:38 -08:00
Lianmin Zheng
3231817861 Revert "[feat] Add math eval to CI" (#2656) 2024-12-30 15:05:50 +08:00
Xiaotong Jiang
a11f8d5f6a [feat] Add math eval to CI (#2652) 2024-12-30 14:49:41 +08:00
Chayenne
1703d766d8 CI: skip special token for engine token ids unit test (#2648) 2024-12-29 13:52:50 -08:00
Shi Shuai
fad29f7f52 CI: Fix unittest for engine input token ids and output token ids (#2646) 2024-12-29 13:28:59 -08:00
Shi Shuai
35bdb48557 [Feature] Get Token IDs with Engine.generate() (#2636)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2024-12-29 12:28:27 -08:00
Ying Sheng
e0e09fceeb [Session] Update session control interface (#2635) 2024-12-29 02:10:27 -08:00
Tanjiro
8ee9a8501a [Feature] Function Calling (#2544)
Co-authored-by: Haoyu Wang <120358163+HaoyuWang4188@users.noreply.github.com>
2024-12-28 21:58:52 -08:00
fzyzcjy
fd28640dc5 Add update_weights_from_tensor (#2631) 2024-12-28 13:30:27 -08:00
Lianmin Zheng
855d0ba381 [CI] Fix nightly test and raise better error message (#2626)
Co-authored-by: Sangbin <rkooo567@gmail.com>
2024-12-27 22:16:39 -08:00
Lianmin Zheng
dc3bee4815 Fix test and benchmark scripts (#2598) 2024-12-26 07:56:26 -08:00
Zhizhou Sha
a74d194146 [unittest] add unit test to test quant args of srt engine (#2574) 2024-12-26 06:54:43 -08:00
Adarsh Shirawalmath
acb340728c [Feature] Support new parameter - EBNF in xgrammar (#2526) 2024-12-26 05:12:41 -08:00
Ke Bao
e835a50021 Reorg moe code (#2563) 2024-12-24 01:10:22 +08:00
Lianmin Zheng
8496701934 [Misc] Fix metrics, weight update lock, request logging (#2543) 2024-12-22 06:27:22 -08:00
Lianmin Zheng
9cd9dc83b3 Temporarily disable unit test of torch native attention backend (#2492) 2024-12-16 14:17:27 -08:00
Ke Bao
2f9bd0fafd Fix correctness issue for triton decoding kernel (#2479) 2024-12-14 16:50:54 +08:00
Fred Reiss
993956c6b1 Add support for IBM Granite 3.x models (#2437) 2024-12-11 06:30:23 -08:00
Ying Sheng
8586b72da0 [feat] Enable chunked prefill for llava-onevision (#2412) 2024-12-09 09:52:38 -08:00
Lianmin Zheng
641b7d0ae0 [Minor] Improve code style (#2422) 2024-12-09 06:30:35 -08:00
Xiaoyu Zhang
3844feb9bb Add a unittest for fused_moe (#2416) 2024-12-08 22:46:10 -08:00
Lianmin Zheng
a6ca736c8e Simplify stream_output (#2398) 2024-12-08 12:27:13 -08:00
Yineng Zhang
f62055b528 minor: add random flashinfer vs triton use case (#2409) 2024-12-09 04:15:21 +08:00
Yineng Zhang
74bc9184c3 minor: add random use case (#2408) 2024-12-09 03:21:35 +08:00
Yineng Zhang
0f8eb15323 feat: support custom task runner (#2407) 2024-12-09 02:29:55 +08:00
Yineng Zhang
67470bbb28 minor: update correct measurement unit (#2406) 2024-12-08 20:55:04 +08:00
Ke Bao
61dec545b0 Remove unused vars in the triton backend (#2401) 2024-12-08 03:37:03 -08:00
Ke Bao
7dc66fcb40 Optimize Triton decoding kernel for long context (#2394) 2024-12-08 01:17:37 -08:00
Lianmin Zheng
0e7409adb6 Fix the overlap for xgrammar (#2377) 2024-12-06 05:49:29 -08:00
xiaobochen
3d32e4a32c Resubmit MoE-EP (#2371) 2024-12-06 15:05:21 +08:00
Lianmin Zheng
07ec07ad1f Improve torch compile for fused moe (#2327) 2024-12-03 01:58:25 -08:00
Ying Sheng
aa47f64223 Revert "[feat] Enable chunked prefill for llava-onevision" (#2329) 2024-12-02 23:11:13 -08:00
Ying Sheng
480e38a733 [feat] Enable chunked prefill for llava-onevision (#2281) 2024-12-02 20:19:02 -08:00
Lianmin Zheng
18108abe5d [Minor] Fix code style (#2311) 2024-12-02 02:27:36 -08:00
Chayenne
983bfcf386 Online weight updates from torch.distributed (#2279) 2024-12-01 23:23:18 -08:00
Qun Yang
62c516ac45 Add a simple torch native attention backend (#2241) 2024-12-01 03:01:25 -08:00
Lianmin Zheng
9449a95431 [CI] Balance CI tests (#2293) 2024-12-01 01:47:30 -08:00
Lianmin Zheng
0303ca918f [CI] Fix missing files in run_suite.py (#2288) 2024-11-30 23:53:34 -08:00
Lianmin Zheng
4936be8acc Revert "Revert "[FEAT] Support GGUF format"" (#2287) 2024-11-30 22:14:48 -08:00
Lianmin Zheng
1bfa511b95 [CI] Fix ci tests (#2284) 2024-11-30 21:12:03 -08:00
Lianmin Zheng
7e4c6dd8da Revert "[FEAT] Support GGUF format" (#2285) 2024-11-30 19:03:26 -08:00
Yang Zheng
883c955489 [FEAT] Support GGUF format (#2215)
Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>
2024-11-30 00:44:48 -08:00
Lianmin Zheng
ccaf1f997c [CI] Print summary on github actions (#2274) 2024-11-29 23:48:54 -08:00