fzyzcjy
|
15ddd84322
|
Add retry for flaky tests in CI (#4755)
|
2025-03-25 16:53:12 -07:00 |
|
Lianmin Zheng
|
bc1534ff32
|
Fix a draft model accuracy bug in eagle; support step=1; return logprob in eagle (#4134)
Co-authored-by: Sehoon Kim <kssteven418@gmail.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Sehoon Kim <sehoon@x.ai>
|
2025-03-06 06:13:59 -08:00 |
|
Lianmin Zheng
|
286e6540a6
|
Remove prefill-only-one-req (#4117)
|
2025-03-05 20:58:48 -08:00 |
|
Lianmin Zheng
|
77a3954bf7
|
Simplify eagle tests and TP sync in grammar backend (#4066)
|
2025-03-04 13:40:40 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
Yineng Zhang
|
6718b10996
|
fix eagle unit test (#3591)
|
2025-02-15 23:10:48 +08:00 |
|
Lianmin Zheng
|
da6f8081f6
|
Fix CI tests (#3132)
|
2025-01-25 17:43:39 -08:00 |
|
Lianmin Zheng
|
a4331cd260
|
Add accuracy and latency tests of eagle into CI (#3027)
|
2025-01-21 02:55:14 -08:00 |
|
Yineng Zhang
|
67470bbb28
|
minor: update correct measurement unit (#2406)
|
2024-12-08 20:55:04 +08:00 |
|
Lianmin Zheng
|
ccaf1f997c
|
[CI] Print summary on github actions (#2274)
|
2024-11-29 23:48:54 -08:00 |
|
Lianmin Zheng
|
3c5538f781
|
Update CI threshold (#2186)
|
2024-11-25 15:24:17 -08:00 |
|
Lianmin Zheng
|
5652c56535
|
Update CI threshold & Improve code style (#2159)
|
2024-11-24 06:29:38 -08:00 |
|
Lianmin Zheng
|
7d671e4ad2
|
Enable overlap by default (#2067)
|
2024-11-19 22:07:58 -08:00 |
|
Yineng Zhang
|
766192610e
|
feat: update torch 2.5.1 (#2069)
|
2024-11-18 21:29:13 +08:00 |
|
Lianmin Zheng
|
38625e2139
|
Remove monkey_patch_vllm_dummy_weight_loader (#2064)
|
2024-11-17 15:48:12 -08:00 |
|
Lianmin Zheng
|
c1f401fc58
|
Revert "chore: update torch v2.5.1" (#2063)
|
2024-11-17 15:29:38 -08:00 |
|
Yineng Zhang
|
3b878863f7
|
chore: update torch v2.5.1 (#1849)
|
2024-11-18 00:06:00 +08:00 |
|
Ying Sheng
|
c98e84c21e
|
[Minor, Performance] Use torch.argmax for greedy sampling (#1589)
|
2024-10-06 13:15:05 -07:00 |
|
Ying Sheng
|
04b262cd91
|
[Fix] Fix major performance bug in certain cases (#1563)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
|
2024-10-04 08:51:11 +00:00 |
|
Lianmin Zheng
|
1acccb364a
|
Fix oom issues with fp8 for llama (#1454)
|
2024-09-18 03:45:19 -07:00 |
|
Lianmin Zheng
|
9463bc1385
|
Enable torch.compile for triton backend (#1422)
|
2024-09-14 15:38:37 -07:00 |
|
Lianmin Zheng
|
ad0ff62a4c
|
Balance test in CI (#1411)
|
2024-09-12 23:29:44 -07:00 |
|
Lianmin Zheng
|
68be2f6d3b
|
[CI] Include triton backend and online serving benchmark into CI (#1408)
|
2024-09-12 21:36:41 -07:00 |
|