Commit Graph

871 Commits

Author SHA1 Message Date
Ying Sheng
6f3cf1297e [CI, AMD] Add AMD tests to CI (#1491) 2024-09-22 04:45:10 -07:00
Lianmin Zheng
13f1357ef0 Add a unit test for data parallelism (#1489) 2024-09-22 02:21:05 -07:00
wellhowtosay
2a99993cd9 Pr fix max workers (#1456)
Co-authored-by: baolujia <baolujia@shizhuang-inc.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-09-22 02:20:26 -07:00
Lianmin Zheng
167591e864 Better unit tests for adding a new model (#1488) 2024-09-22 01:50:37 -07:00
Yineng Zhang
441c22db8c doc: update backend (#1486) 2024-09-21 22:05:12 +08:00
Ran Chen
ce636ac441 fix incorrect links in documentation (#1481)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-09-21 20:36:23 +08:00
Yineng Zhang
82136eb0b5 chore: bump v0.3.1.post3 (#1483) 2024-09-21 11:17:45 +08:00
Ke Bao
b8ccaf4d73 Add MLA gsm8k eval (#1484) 2024-09-21 11:16:13 +08:00
Ke Bao
a68cb201dd Fix triton head num (#1482) 2024-09-21 10:25:20 +08:00
Niklas Muennighoff
014982b5e0 Add OLMoE (#1476) 2024-09-20 10:32:49 +08:00
Yineng Zhang
a6db88626e minor: add quant eval compared with base (#1475) 2024-09-20 01:57:19 +08:00
Yineng Zhang
b4408b0d16 feat: update linear deps 1/N (#1305) 2024-09-19 20:53:11 +08:00
Lianmin Zheng
2cd7e181dd Fix env vars in bench_latency (#1472) 2024-09-19 03:19:26 -07:00
Lianmin Zheng
5ce55aee15 Release v0.3.1.post2 (#1470) 2024-09-19 02:03:38 -07:00
Lianmin Zheng
2d346a57c2 Fix padding in the cuda graph (#1469) 2024-09-19 01:52:15 -07:00
Li Bo
446ea33277 fix: creat new dict everytime for putting new frame (#1464) 2024-09-19 01:31:48 -07:00
Ying Sheng
8f527e2940 [Event] Add public meeting invite to README (#1458) 2024-09-18 23:53:22 +08:00
Lianmin Zheng
7f24ea95c3 Fuse top_k and top_k in the sampler (#1457) 2024-09-18 04:35:35 -07:00
Lianmin Zheng
1acccb364a Fix oom issues with fp8 for llama (#1454) 2024-09-18 03:45:19 -07:00
HAI
aa2750beb3 [Bugfix] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1419) (#1453) 2024-09-18 02:01:35 -07:00
Lianmin Zheng
5e62a6b706 Add bench_server_latency.py (#1452) 2024-09-18 00:56:06 -07:00
Xiao Yu
5752f25eef Fixed n>1 causing list index out of range with VLM (#1449) 2024-09-18 00:46:32 -07:00
Liangsheng Yin
7c162fa9c5 Fix schedule bug (#1451) 2024-09-17 22:59:32 -07:00
Liangsheng Yin
36078fb247 fix schedule bug (#1450) 2024-09-17 16:33:53 -07:00
Ke Bao
b3710d2c93 Fix attention backend (#1448) 2024-09-17 14:07:53 +00:00
Ke Bao
c6b6d2e71b Enable MLA by default (#1447) 2024-09-17 11:42:48 +00:00
Lianmin Zheng
90a26be31c Release 0.3.1.post1 (#1445) 2024-09-17 01:47:31 -07:00
Jani Monoses
1f4b5f770d Add OLMoE model (#1444) 2024-09-17 01:14:53 -07:00
Ke Bao
76524b70d1 Fix torch compile for deepseek-v2 (#1442) 2024-09-17 00:52:08 -07:00
HAI
3a6e04185b [Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1420) 2024-09-17 07:43:52 +00:00
Lianmin Zheng
2fa5cec775 Simplify sampler and its error handling (#1441) 2024-09-16 21:23:31 -07:00
Lianmin Zheng
27b557aea7 Clean up model loader (#1440) 2024-09-16 18:16:27 -07:00
zifeitong
93dffd699b Add constrained_json_whitespace_pattern to ServerArgs (#1438) 2024-09-16 13:29:18 -07:00
Ying Sheng
2abe4f1cb6 Revert "[Minor] Raise exception for wrong import (#1409)" (#1432) 2024-09-15 15:22:32 -07:00
Ying Sheng
37963394aa [Feature] Support LoRA path renaming and add LoRA serving benchmarks (#1433) 2024-09-15 12:46:04 -07:00
Lianmin Zheng
899cf5c438 Remove deprecated configs (#1431) 2024-09-15 08:52:18 -07:00
Lianmin Zheng
e79f6cd73d Release v0.3.1 (#1430) 2024-09-15 23:03:16 +09:00
Lianmin Zheng
9ba1f09760 [Fix] Fix logprob and normalized_logprob (#1428) 2024-09-15 06:36:06 -07:00
Lianmin Zheng
282681b8a1 Update backend.md (#1429) 2024-09-15 02:55:34 -07:00
William Arnold
58cafe23a7 Add libibverbs-dev to Dockerfile (#1427) 2024-09-15 15:40:31 +09:00
Lianmin Zheng
9463bc1385 Enable torch.compile for triton backend (#1422) 2024-09-14 15:38:37 -07:00
Yineng Zhang
e3fc4658f4 fix: resolve nightly eval (#1426) 2024-09-15 02:07:52 +10:00
Ke Bao
33b54e7c40 Add pytorch sampling backend ut (#1425) 2024-09-15 01:15:30 +10:00
Jerry Zhang
30b404ce72 Add torchao quant for mixtral and qwen_moe (#1418) 2024-09-14 06:46:55 +00:00
Liangsheng Yin
70b6802982 Optimize conflicts between CUDA graph and vocab mask tensors (#1392) 2024-09-13 20:27:53 -07:00
Yineng Zhang
f3d32f888a ci: fix finish (#1414) 2024-09-14 01:01:30 +10:00
Lianmin Zheng
8779da95d6 Update pr-test.yml (#1412) 2024-09-13 00:37:13 -07:00
Lianmin Zheng
ad0ff62a4c Balance test in CI (#1411) 2024-09-12 23:29:44 -07:00
Ying Sheng
9a903a8784 [Minor] Raise exception for wrong import (#1409) 2024-09-12 23:02:36 -07:00
Lianmin Zheng
68be2f6d3b [CI] Include triton backend and online serving benchmark into CI (#1408) 2024-09-12 21:36:41 -07:00