Lianmin Zheng
|
761b2cebd6
|
[CI] merge all ci tests into one file (#1289)
|
2024-09-01 02:36:56 -07:00 |
|
Yineng Zhang
|
54772f784a
|
feat: fix fp8 for MLA and support bmm fp8 for DeepSeek V2 (#1285)
Co-authored-by: ispobock <ispobaoke@163.com>
|
2024-09-01 17:28:06 +10:00 |
|
Lianmin Zheng
|
1b5d56f7f8
|
[CI] Add more multi-gpu tests (#1280)
|
2024-09-01 00:27:25 -07:00 |
|
xiaobochen
|
d134c139a1
|
Optimize the update flashinfer indices (#1262)
|
2024-08-31 23:40:28 -07:00 |
|
Byron Hsu
|
6cc9c52521
|
[doc] fix quick start link (#1282)
|
2024-08-31 22:54:34 -07:00 |
|
Yineng Zhang
|
52cefdbf57
|
fix: resolve the fp8 bug introduced by vLLM 0.5.5 (#1276)
|
2024-09-01 00:44:29 +10:00 |
|
Christopher Chou
|
51c554d812
|
Allow more flexible assistant and system response (#1256)
|
2024-08-30 11:51:44 -07:00 |
|
Lianmin Zheng
|
79ece2c51f
|
Report median instead of mean in bench_latency.py (#1269)
|
2024-08-30 06:05:01 -07:00 |
|
김종곤
|
55f5976b42
|
Update README.md - Supported Models add Exaone 3.0 (#1267)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-30 18:49:07 +10:00 |
|
김종곤
|
b7f8341014
|
EXAONE 3.0 Model Support (#1258)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-30 08:08:28 +00:00 |
|
Ke Bao
|
f414352ae6
|
Transpose mla weight offline (#1261)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-30 16:45:40 +10:00 |
|
lxww302
|
a362340b33
|
fix: multimodal_config in monkey_patch_vllm_dummy_weight_loader (#1260)
|
2024-08-30 16:43:41 +10:00 |
|
Liangsheng Yin
|
381dd57bd6
|
Sampler cudagraph (#1253)
|
2024-08-28 18:58:52 -07:00 |
|
Zhiqiang Xie
|
8153168c96
|
fix data racing due to mutable reference using deepcopy (#1255)
|
2024-08-28 18:57:54 -07:00 |
|
Enrique Shockwave
|
6c34d6339c
|
make json_schema usable from gen (#1254)
|
2024-08-28 18:57:10 -07:00 |
|
Yineng Zhang
|
13ac95b894
|
chore: bump v0.2.14.post2 (#1250)
|
2024-08-28 18:46:33 +00:00 |
|
Yineng Zhang
|
492143bf32
|
fix: resolve qwen2 moe weight loader (#1252)
|
2024-08-28 11:25:46 -07:00 |
|
Lianmin Zheng
|
0a97d7962d
|
[Fix] Fix OOM in llava base class (#1249)
|
2024-08-28 08:45:49 -07:00 |
|
Yineng Zhang
|
c411f32e1c
|
feat: replace GeluAndMul (#1234)
|
2024-08-28 14:07:02 +00:00 |
|
Lianmin Zheng
|
bf53bf5142
|
[Fix] Fix llava on multi images (#1247)
|
2024-08-28 06:33:05 -07:00 |
|
Yineng Zhang
|
b1a540ec42
|
feat: update GemmaRMSNorm (#1232)
|
2024-08-28 22:47:34 +10:00 |
|
Yineng Zhang
|
66975360e7
|
fix: increase max_new_tokens when testing generation models (#1244)
|
2024-08-28 22:12:36 +10:00 |
|
Lianmin Zheng
|
6c49831394
|
Add sglang.bench_latency to CI (#1243)
|
2024-08-28 21:20:54 +10:00 |
|
Yineng Zhang
|
f25f4dfde5
|
hotfix: revert sampler CUDA Graph (#1242)
|
2024-08-28 21:16:47 +10:00 |
|
Lianmin Zheng
|
184ae1c683
|
Update README.md (#1239)
|
2024-08-28 02:15:52 -07:00 |
|
Yineng Zhang
|
198974cd1a
|
feat: support sm75 with FlashInfer v0.1.6 (#1233)
|
2024-08-28 18:39:12 +10:00 |
|
Lianmin Zheng
|
6cc38b2bf3
|
[Minor] Add more type annotations (#1237)
|
2024-08-28 00:54:26 -07:00 |
|
Liangsheng Yin
|
1ece2cda3d
|
Fix bench latency benchmark (#1225)
|
2024-08-28 00:37:32 -07:00 |
|
Dr. Artificial曾小健
|
c8a9e79186
|
Fix readme (#1236)
|
2024-08-27 23:51:41 -07:00 |
|
Yineng Zhang
|
3602692c7c
|
feat: replace get_act_fn for gpt_bigcode (#1231)
|
2024-08-27 21:15:31 +10:00 |
|
havetc
|
909f34363b
|
[FIX] Wrong logger (#1230)
|
2024-08-27 20:10:46 +10:00 |
|
yichuan~
|
5ff25cdf5b
|
[Minor] add delete test and delete tmp file on ci server (#1227)
|
2024-08-26 22:04:52 -07:00 |
|
caiyueliang
|
2f1d92834f
|
[FEAT] Support batches cancel (#1222)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-26 23:28:26 +00:00 |
|
Liangsheng Yin
|
c61a1b6f97
|
Torch compile CI throughput test (#1223)
|
2024-08-26 13:52:58 -07:00 |
|
havetc
|
9935f97b3e
|
[FEAT] JSON constrained support (#1125)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-26 09:37:26 -07:00 |
|
Yineng Zhang
|
c5fe11a8e1
|
chore: bump v0.2.14 (#1155)
|
2024-08-27 00:28:24 +10:00 |
|
Liangsheng Yin
|
75ce37f401
|
Move sampler into CUDA graph (#1201)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-26 07:02:50 -07:00 |
|
Mingyi
|
97589a60a2
|
[CI] Parallelize unit tests in CI (#1219)
|
2024-08-26 04:54:02 +00:00 |
|
Liangsheng Yin
|
632d506d0b
|
minor: improve CI and dependencies (#1212)
|
2024-08-26 04:26:31 +00:00 |
|
Kaichen Zhang - NTU
|
3579162ab1
|
[Fix] Multi-images loading error (#1218)
|
2024-08-26 03:58:51 +00:00 |
|
Mingyi
|
7514b9f8d3
|
[CI] Fix CI (#1217)
|
2024-08-26 02:56:42 +00:00 |
|
Mingyi
|
158e8f1e2d
|
improve the threshold and ports in tests (#1215)
|
2024-08-25 19:02:08 -07:00 |
|
Lianmin Zheng
|
d3efcb3930
|
Update workflow files (#1214)
|
2024-08-25 17:45:35 -07:00 |
|
Ke Bao
|
2c615d120f
|
[Feature] Support fp8 e5m2 kv cache with flashinfer (#1204)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-25 17:38:11 -07:00 |
|
Lianmin Zheng
|
61bb223e0f
|
Update CI runner docs (#1213)
|
2024-08-25 17:31:52 -07:00 |
|
Lianmin Zheng
|
15f1a49d2d
|
Update CI workflows (#1210)
|
2024-08-25 16:43:07 -07:00 |
|
Ying Sheng
|
308d024092
|
[CI] Fix the issue of unit test hanging (#1211)
|
2024-08-25 16:21:37 -07:00 |
|
Ying Sheng
|
ab4990e4bf
|
[Minor] Temporarily skip flaky test (#1209)
|
2024-08-25 14:49:23 -07:00 |
|
Lianmin Zheng
|
902278008a
|
[Minor] Improve the function organization in TokenizerManager & improve loggers (#1208)
|
2024-08-25 14:46:34 -07:00 |
|
Chayenne
|
30b4f771b0
|
Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-08-25 10:29:12 -07:00 |
|