Lianmin Zheng
|
2558d6a675
|
Fix the default arguments of bench_offline_throughput.py & simplify detokenizer manager (#2042)
|
2024-11-15 05:02:44 -08:00 |
|
zolinthecow
|
f6dd648620
|
Offline LLM Engine Benchmark Throughput (#1968)
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
|
2024-11-14 21:59:33 -08:00 |
|
Lianmin Zheng
|
c3eac1b010
|
Fix torch.compile for MoE (#2033)
|
2024-11-14 01:30:24 -08:00 |
|
Lianmin Zheng
|
f407fcf9ef
|
Release v0.3.5.post1 (#2022)
|
2024-11-13 10:27:12 -08:00 |
|
Lianmin Zheng
|
ba069a24d3
|
Fix grammar backend (#2018)
|
2024-11-12 21:17:38 -08:00 |
|
DarkSharpness
|
125b1199c5
|
support parallel grammar preprocessing (#1996)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2024-11-12 08:45:28 -08:00 |
|
Xiaoyu Zhang
|
eff468dd5a
|
fix test_embedding_models prompt length too long's bug (#2015)
|
2024-11-12 23:21:16 +08:00 |
|
Xiaoyu Zhang
|
027e65248f
|
support echo=true and logprobs in openai api when logprobs=1 in lm-evaluation-harness (#1998)
|
2024-11-11 23:21:20 -08:00 |
|
James Xu
|
ddeb9d42de
|
Add engine encode (#1995)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2024-11-11 11:48:17 -08:00 |
|
Lianmin Zheng
|
1929c06762
|
Simplify prometheus metrics (#1981)
Co-authored-by: Mohit Reddy <mohitreddy1996@users.noreply.github.com>
|
2024-11-10 04:39:32 -08:00 |
|
Lianmin Zheng
|
520f0094e4
|
[CI] balance unit tests (#1977)
|
2024-11-09 16:46:14 -08:00 |
|
Lianmin Zheng
|
9c939a3d8b
|
Clean up metrics code (#1972)
|
2024-11-09 15:43:20 -08:00 |
|
Lianmin Zheng
|
549e8b8366
|
[Minor] Fix a typo in test_torchao.py (#1976)
|
2024-11-09 15:07:27 -08:00 |
|
Yudi Xue
|
95a4ed129a
|
Fix metrics (#1963)
|
2024-11-08 23:21:11 -08:00 |
|
Chayenne
|
c77c1e05ba
|
fix black in pre-commit (#1940)
|
2024-11-08 07:42:47 +08:00 |
|
Xuehai Pan
|
a5e0defb5a
|
minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926)
|
2024-11-06 13:46:04 +00:00 |
|
Chayenne
|
704f8e8ed1
|
Add Reward API Docs etc (#1910)
Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu>
|
2024-11-03 22:33:03 -08:00 |
|
Lianmin Zheng
|
2ce32db6fb
|
Let reward model take text inputs instead of message lists (#1907)
Co-authored-by: Kyle Corbitt <kyle@corbt.com>
|
2024-11-03 13:27:12 -08:00 |
|
Lianmin Zheng
|
0abbf289a8
|
Unify the model type checking (#1905)
|
2024-11-03 12:25:39 -08:00 |
|
Lianmin Zheng
|
c17c578108
|
Simplify tokenizer manager (#1904)
|
2024-11-03 08:38:26 -08:00 |
|
Lianmin Zheng
|
d1b31b0684
|
Improve docs and fix the broken links (#1875)
|
2024-11-01 17:47:44 -07:00 |
|
Yineng Zhang
|
104bf2609b
|
minor: update nightly eval (#1867)
|
2024-11-01 21:38:29 +08:00 |
|
Yineng Zhang
|
d86a2d6562
|
minor: add human eval (#1754)
|
2024-11-01 14:29:20 +08:00 |
|
Liangsheng Yin
|
b9fd178f1b
|
Fix retraction + overlap (#1860)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2024-10-31 18:27:42 -07:00 |
|
Lianmin Zheng
|
a2e0424abf
|
Fix memory leak for chunked prefill 2 (#1858)
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2024-10-31 14:51:51 -07:00 |
|
Lianmin Zheng
|
f7102fbd2b
|
Fix mixed chunked prefill (#1850)
|
2024-10-30 21:20:41 -07:00 |
|
DanielC12321
|
5e00ddebc0
|
Add new model: Gpt2 (#1833)
|
2024-10-29 17:52:33 -07:00 |
|
Byron Hsu
|
680cad2023
|
fix get_memory_pool_size deadlock for DP (#1830)
|
2024-10-28 23:07:14 -07:00 |
|
Byron Hsu
|
0a24eb850a
|
Fix update_weights deadlock for DP (#1825)
|
2024-10-28 12:02:23 -07:00 |
|
Byron Hsu
|
6fcd6d7d6d
|
Support token ids in engine.generate (#1820)
|
2024-10-27 14:02:34 -07:00 |
|
Ke Bao
|
c77762d57f
|
Fix Triton decode kernel & ut (#1819)
|
2024-10-27 10:54:38 -07:00 |
|
Lianmin Zheng
|
86fc0d79d0
|
Add a watch dog thread (#1816)
|
2024-10-27 02:00:50 -07:00 |
|
Lianmin Zheng
|
2b80978859
|
Provide an argument to set the maximum batch size for cuda graph (#1809)
|
2024-10-26 15:09:33 -07:00 |
|
Lianmin Zheng
|
6aa94b967c
|
Update ci workflows (#1804)
|
2024-10-26 04:32:36 -07:00 |
|
Lianmin Zheng
|
fb99aaa527
|
[Fix] Fix --skip-tokenizer-init (#1798)
|
2024-10-25 18:51:59 -07:00 |
|
Lianmin Zheng
|
e646c5901e
|
Fix logprob in the overlapped mode (#1795)
|
2024-10-25 11:06:57 -07:00 |
|
Lianmin Zheng
|
c555ce2ca2
|
Revert "Fix memory leak when doing chunked prefill" (#1797)
|
2024-10-25 10:24:44 -07:00 |
|
Lianmin Zheng
|
40900baea7
|
[Fix] Fix the log parsing in chunked prefill uni tests (#1794)
|
2024-10-25 08:31:08 -07:00 |
|
Liangsheng Yin
|
a2f5e7555f
|
Fix memory leak when doing chunked prefill (#1787)
|
2024-10-25 08:01:17 -07:00 |
|
Lianmin Zheng
|
1701b0db31
|
Enhance the test case for chunked prefill (#1785)
|
2024-10-24 21:23:09 -07:00 |
|
Lianmin Zheng
|
05b3bf5e8e
|
Crash the server on warnings in CI (#1772)
|
2024-10-23 16:27:13 -07:00 |
|
Ying Sheng
|
2fce449b1c
|
[API] add get memory pool size (#1760)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2024-10-23 07:02:29 +00:00 |
|
Lianmin Zheng
|
ad4125d1a9
|
Fuse more ops & Simplify token mapping (#1758)
|
2024-10-22 23:20:43 -07:00 |
|
Liangsheng Yin
|
94cde10920
|
Llama3.2 vision model support (#1551)
|
2024-10-21 15:01:21 -07:00 |
|
Lianmin Zheng
|
00611286a1
|
Fix sliding window attention and gemma-2 unit tests in CI (#1746)
|
2024-10-21 13:47:12 -07:00 |
|
Lianmin Zheng
|
cf470fea32
|
Make token mapping non-blocking in the overlapped mode (#1740)
|
2024-10-20 23:25:14 -07:00 |
|
sixgod
|
45d5af2416
|
Add GLM-4 TextGeneration Model support for SGLang (#1736)
|
2024-10-21 04:08:30 +00:00 |
|
yizhang2077
|
554fbf93cd
|
[Bugfix] qwen2vl forward_extend (#1727)
|
2024-10-20 02:38:35 -07:00 |
|
Lianmin Zheng
|
b48edff67f
|
Split the overlapped version of TpModelWorkerClient into a separate file (#1726)
|
2024-10-20 00:29:29 -07:00 |
|
Lianmin Zheng
|
593b19f29d
|
Temporarily skip this test_mixed_batch for QWen2VL (#1725)
|
2024-10-20 00:05:45 -07:00 |
|