Commit Graph

151 Commits

Author SHA1 Message Date
Lianmin Zheng
e79f6cd73d Release v0.3.1 (#1430) 2024-09-15 23:03:16 +09:00
Lianmin Zheng
9463bc1385 Enable torch.compile for triton backend (#1422) 2024-09-14 15:38:37 -07:00
hxer7963
c33d82a211 Add Support for XVERSE Models (Dense and MoE) to sglang (#1397)
Co-authored-by: will he <hexin@xverse.cn>
Co-authored-by: root <root@localhost.localdomain>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-09-12 01:47:52 -07:00
William
2a71be5e25 Fix README format (#1399) 2024-09-11 23:46:51 -07:00
Vectory
224200e3c2 BaiChuan2 Model (#1367)
Co-authored-by: wanpenghan <wanpenghan@sohu-inc.com>
2024-09-11 03:55:24 -07:00
Lianmin Zheng
46094e0c1b Deprecate --disable-flashinfer and introduce --attention-backend (#1380) 2024-09-10 17:11:16 -07:00
William
e72275cf7f Support MiniCPM3 (#1371) 2024-09-10 19:57:52 +10:00
Lianmin Zheng
8d1095dbf0 [Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00
Yineng Zhang
5ab9418f5b [Doc] update news (#1327) 2024-09-04 04:21:21 -07:00
Yineng Zhang
a63c8275c6 chore: bump v0.3.0 (#1320) 2024-09-04 04:32:18 +08:00
Lianmin Zheng
c500f96bb1 Update README.md for llava-onevision instructions (#1313) 2024-09-03 01:43:08 -07:00
Lianmin Zheng
f64eae3a29 [Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308) 2024-09-02 21:44:45 -07:00
Lianmin Zheng
9999442756 Release v0.2.15 (#1295) 2024-09-01 22:22:38 -07:00
Byron Hsu
4a9f8ea43b [doc] Fix more broken links (#1294) 2024-09-01 14:46:36 -07:00
Byron Hsu
6cc9c52521 [doc] fix quick start link (#1282) 2024-08-31 22:54:34 -07:00
Lianmin Zheng
79ece2c51f Report median instead of mean in bench_latency.py (#1269) 2024-08-30 06:05:01 -07:00
김종곤
55f5976b42 Update README.md - Supported Models add Exaone 3.0 (#1267)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-08-30 18:49:07 +10:00
Yineng Zhang
13ac95b894 chore: bump v0.2.14.post2 (#1250) 2024-08-28 18:46:33 +00:00
Lianmin Zheng
bf53bf5142 [Fix] Fix llava on multi images (#1247) 2024-08-28 06:33:05 -07:00
Yineng Zhang
f25f4dfde5 hotfix: revert sampler CUDA Graph (#1242) 2024-08-28 21:16:47 +10:00
Lianmin Zheng
184ae1c683 Update README.md (#1239) 2024-08-28 02:15:52 -07:00
Yineng Zhang
198974cd1a feat: support sm75 with FlashInfer v0.1.6 (#1233) 2024-08-28 18:39:12 +10:00
Dr. Artificial曾小健
c8a9e79186 Fix readme (#1236) 2024-08-27 23:51:41 -07:00
Yineng Zhang
c5fe11a8e1 chore: bump v0.2.14 (#1155) 2024-08-27 00:28:24 +10:00
Chayenne
30b4f771b0 Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-08-25 10:29:12 -07:00
Lianmin Zheng
b20daf982a Update README.md (#1198) 2024-08-24 14:50:05 -07:00
Lianmin Zheng
f6af3a6561 Cleanup readme, llava examples, usage examples and nccl init (#1194) 2024-08-24 08:02:23 -07:00
Kaichen Zhang - NTU
a5b14ad043 [Feat/WIP] add llava-onevision, with support for (1) siglip encoder, (2) qwen2 decoder (3) openai api compatible server. (#1123)
Co-authored-by: Bo Li <drluodian@gmail.com>
2024-08-23 14:11:16 -07:00
Zhanghao Wu
ac1b74fa85 [Docs] Fix rendering of details in README (#1179) 2024-08-22 07:05:33 +08:00
Yineng Zhang
350a81609b fix: resolve README render (#1166) 2024-08-21 03:23:52 +10:00
Lianmin Zheng
a8ae640328 Improve docs and warnings (#1164) 2024-08-20 08:31:29 -07:00
Zhanghao Wu
d8627ed16d [Docs] Add instruction for running on clouds and kubernetes with SkyPilot (#1144)
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
2024-08-19 14:01:55 +08:00
Yineng Zhang
5bd953749b chore: bump v0.2.13 (#1111) 2024-08-16 03:50:43 +10:00
Yineng Zhang
fe5024325b docs: update README (#1098) 2024-08-14 04:40:05 -07:00
Lucien
312e849255 Example file for docker compose and k8s (#1006) 2024-08-13 15:07:57 -07:00
Yineng Zhang
b0ad0c1bc8 chore: bump v0.2.12 (#1048) 2024-08-12 20:59:38 +10:00
Lianmin Zheng
41598e0d8e Add longer accuracy test on CI (#1049) 2024-08-12 09:21:38 +00:00
Lianmin Zheng
a97df79124 Clean up readme and arguments of chunked prefill (#1022) 2024-08-11 01:18:52 -07:00
Lianmin Zheng
54fb1c80c0 Clean up unit tests (#1020) 2024-08-10 15:09:03 -07:00
liuyhwangyh
b91a4cb1b1 support models from www.modelscope.cn (#994)
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
2024-08-09 02:52:14 -07:00
Yineng Zhang
dc9d06d886 chore: bump v0.2.11 (#970) 2024-08-07 20:47:53 +08:00
Yineng Zhang
c31f084c71 chore: update vllm to 0.5.4 (#966) 2024-08-07 21:15:41 +10:00
Yineng Zhang
fde8340550 docs: update README (#935) 2024-08-05 20:06:06 +10:00
Ying Sheng
399cad91f3 Update README.md (#927) 2024-08-04 23:01:35 -07:00
Ying Sheng
3bc99e6fe4 Test openai vision api (#925) 2024-08-05 13:51:55 +10:00
Ying Sheng
141e8c71a3 Bump version to 0.2.10 (#923) 2024-08-04 16:52:51 -07:00
Ying Sheng
8c5382e62c Update README.md 2024-08-03 12:58:41 -07:00
Ying Sheng
b906c01592 Bump version to 0.2.9.post1 (#899) 2024-08-02 12:08:00 -07:00
Ying Sheng
30a9b2ef20 Bump version to v0.2.9 (#890) 2024-08-02 01:45:48 -07:00
Ying Sheng
e4d3333c6c bump to 0.2.8 (#877) 2024-08-01 14:18:26 -07:00