Byron Hsu
|
20453cef62
|
[test] Lower number of top logprobs to get rid of -inf (#3212)
|
2025-01-30 18:01:23 +08:00 |
|
Mick
|
9f635ea50d
|
[Fix] Address remaining issues of supporting MiniCPMV (#2977)
|
2025-01-28 00:22:13 -08:00 |
|
Byron Hsu
|
988d0a4bfc
|
[kernel] Use sgl_kernel rope (#3169)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-01-28 14:33:11 +08:00 |
|
Byron Hsu
|
27aeb4b7d8
|
[test] deduplicate test_session_control (#3183)
|
2025-01-28 13:17:06 +08:00 |
|
Lianmin Zheng
|
f8ca66fb49
|
Update thresholds in test_nightly_gsm8k_eval.py (#3176)
|
2025-01-27 03:02:09 -08:00 |
|
Lianmin Zheng
|
52c03f16b9
|
Add activation parameters to fused_moe (#3170)
|
2025-01-27 00:23:37 -08:00 |
|
yizhang2077
|
1e3e521544
|
add unit test for block wise fp8 (#3156)
|
2025-01-27 15:32:04 +08:00 |
|
Lianmin Zheng
|
af02f99b7c
|
Add more logprob tests (#3162)
|
2025-01-26 22:24:55 -08:00 |
|
YAMY
|
b045841bae
|
Feature/function calling update (#2700)
Co-authored-by: Mingyuan Ma <mamingyuan2001@berkeley.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
|
2025-01-26 09:57:51 -08:00 |
|
Lianmin Zheng
|
f4a92f4b56
|
Temporarily skip the openai frontend tests (#3151)
|
2025-01-26 04:17:35 -08:00 |
|
Lianmin Zheng
|
d1a0863251
|
Add a test case for cached_tokens (#3145)
|
2025-01-26 01:39:28 -08:00 |
|
Lianmin Zheng
|
da6f8081f6
|
Fix CI tests (#3132)
|
2025-01-25 17:43:39 -08:00 |
|
Lianmin Zheng
|
a4331cd260
|
Add accuracy and latency tests of eagle into CI (#3027)
|
2025-01-21 02:55:14 -08:00 |
|
Lianmin Zheng
|
287d07a669
|
Misc fixes for eagle (flush_cache, CPU overhead) (#3014)
|
2025-01-20 20:27:38 -08:00 |
|
Hongpeng Guo
|
583697cd71
|
[Enhancement] Custom Logit Processor Improvement (#2998)
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
|
2025-01-20 02:00:35 -08:00 |
|
Lianmin Zheng
|
51e87f6f21
|
Skip flaky custom_logit_processor tests (#3004)
|
2025-01-20 00:28:47 -08:00 |
|
Lianmin Zheng
|
03464890e0
|
Separate two entry points: Engine and HTTP server (#2996)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
|
2025-01-19 22:09:24 -08:00 |
|
Lianmin Zheng
|
cd493b5afc
|
Improve metrics, logging, and importing orders (#2992)
|
2025-01-19 18:36:59 -08:00 |
|
Lianmin Zheng
|
61f42b5732
|
Move sgl.Runtime under sglang/lang (#2990)
|
2025-01-19 17:10:29 -08:00 |
|
Hongpeng Guo
|
e403d23757
|
[Feature] Add sampler custom logits processor (#2396)
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
|
2025-01-19 14:46:53 -08:00 |
|
Enrique Shockwave
|
3bcf5ecea7
|
support regex in xgrammar backend (#2983)
|
2025-01-20 04:34:41 +08:00 |
|
Chang Su
|
4d4cdb3fe7
|
Frontend: better error message handling for FINISH_ABORT in scheduler.py (#2956)
|
2025-01-18 19:37:30 -08:00 |
|
Mick
|
3d93f84a00
|
[Feature] Support minicpmv v2.6 (#2785)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2025-01-18 14:14:19 -08:00 |
|
bjmsong
|
d3024f4fc8
|
support e4m3 kvcache in qwen2 & add kv scaling facotr json (#2894)
Co-authored-by: bjmsong <bjmsong@126.com>
|
2025-01-18 11:43:22 +08:00 |
|
Ke Bao
|
d47c5101f1
|
Add ut for qwen model (#2947)
|
2025-01-18 00:03:54 +08:00 |
|
Chang Su
|
a8ccacc8b8
|
[Frontend] Fix request length check and add option to disallow auto truncation in scheduler (#2876)
|
2025-01-16 14:51:19 -08:00 |
|
Lianmin Zheng
|
8f2c522aba
|
Improve benchmark scripts and error message printing (#2922)
|
2025-01-16 06:24:31 -08:00 |
|
yizhang2077
|
767c9dec03
|
adapt custom allreduce for tensorrt llm (#2511)
|
2025-01-16 04:57:35 +08:00 |
|
Ke Bao
|
bfbda62c8b
|
Add ut for w8a8 int8 quantization (#2897)
|
2025-01-15 18:29:14 +08:00 |
|
fzyzcjy
|
923f518337
|
CUDA-graph-compatible releasing and resuming KV cache and model weight memory (#2630)
|
2025-01-13 11:38:51 -08:00 |
|
Lianmin Zheng
|
6249e4a19e
|
Revert "Integration of TurboMind AWQ" (#2866)
|
2025-01-13 04:44:39 -08:00 |
|
bjmsong
|
17de02f98d
|
Integration of TurboMind AWQ (#2828)
Co-authored-by: root <bjmsong@126.com>
|
2025-01-13 20:14:16 +08:00 |
|
Lianmin Zheng
|
51ab3ccf47
|
Collect more metrics: num_requests_total (#2859)
|
2025-01-13 03:57:39 -08:00 |
|
Lianmin Zheng
|
67008f4b32
|
Use only one GPU for MLA CI tests (#2858)
|
2025-01-13 03:55:33 -08:00 |
|
Lianmin Zheng
|
72c7776355
|
Fix linear.py and improve weight loading (#2851)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-01-13 01:39:14 -08:00 |
|
bjmsong
|
0bb0f76311
|
Support FP8 E4M3 KV Cache (#2786)
Co-authored-by: root <bjmsong@126.com>
|
2025-01-12 21:17:11 -08:00 |
|
Shi Shuai
|
c4f9707e16
|
Improve: Token-In Token-Out Usage for RLHF (#2843)
|
2025-01-11 15:14:26 -08:00 |
|
Lianmin Zheng
|
f1769586d6
|
Update threshold in test_nightly_gsm8k_eval.py (#2836)
|
2025-01-10 20:37:34 -08:00 |
|
justdoit
|
a47bf39123
|
[Eagle2] Fix multiple concurrent request crashes (#2730)
|
2025-01-10 14:00:43 -08:00 |
|
Chang Su
|
f290bd4332
|
[Bugfix] Fix embedding model hangs with --enable-metrics (#2822)
|
2025-01-10 13:14:51 -08:00 |
|
JJJJOHNSON
|
694e41925e
|
[eagle2] fix end check when target model verify (#2723)
|
2025-01-07 21:46:02 -08:00 |
|
Lianmin Zheng
|
b22f3f6475
|
Fix nightly accuracy tests (#2780)
|
2025-01-07 21:02:35 -08:00 |
|
Lianmin Zheng
|
6fb5768372
|
Disable math eval on nightly CI temporarily (#2779)
|
2025-01-07 18:17:34 -08:00 |
|
libra
|
bdb3929dbb
|
Refactor SchedulePolicy to improve code organization (#2571)
|
2025-01-04 00:05:16 +08:00 |
|
Lianmin Zheng
|
0f9cc6d8d3
|
Fix package loss for small models (#2717)
Co-authored-by: sdli1995 < mmlmonkey@163.com>
|
2025-01-02 18:25:26 -08:00 |
|
Shi Shuai
|
dd2e2d275f
|
Docs: Update documentation workflow and contribution guide (#2704)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-01-02 09:18:31 -08:00 |
|
yukavio
|
815dce0554
|
Eagle speculative decoding part 4: Add EAGLE2 worker (#2150)
Co-authored-by: kavioyu <kavioyu@tencent.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-01-02 03:22:34 -08:00 |
|
fzyzcjy
|
9183c23eca
|
Speed up update_weights_from_tensor (#2695)
|
2025-01-02 02:05:19 -08:00 |
|
Xiaotong Jiang
|
a4d6d6f1dd
|
[feat]: Add math eval to CI nightly run (#2663)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-01-01 15:29:35 -08:00 |
|
Lianmin Zheng
|
21ec66e59e
|
Minor follow-up fixes for the logprob refactor (#2670)
|
2024-12-30 05:42:08 -08:00 |
|