sglang

EngineX-Hygon/sglang

Fork 0

Commit Graph

Select branches

Hide Pull Requests

0.5.3rc0

v0.5.2

v0.5.2rc1

v0.5.3_dev

v0.5.4

v0.5.4_dev

v0.5.4_dev_liucong

v0.5.4_dev_maxiao

2432ad40c6 [Minifix] Remove extra space in cot example (#1569) FredericOdermatt 2024-10-04 10:16:53 +02:00
45473d4b2b Make input_ids a torch.Tensor (#1568) Lianmin Zheng 2024-10-04 01:09:59 -07:00
114bbc8651 Use ipc instead of tcp in zmq (#1566) Lianmin Zheng 2024-10-04 00:45:52 -07:00
32eb6e96f2 Organize sampling batch info better (#1562) Lianmin Zheng 2024-10-03 18:29:49 -07:00
e0b5dbcec1 [FP8 KV Cache] Avoid KeyError at loading pre-quantized FP8 model with kv_scale (#1559) HAI 2024-10-03 01:52:26 -07:00
e6852b0dd2 [Fix] Fix AttributeError in Qwen2.5 LoRA: 'Qwen2ForCausalLM' object has no attribute 'get_hidden_dim' (#1536) Minsang Song 2024-10-03 12:41:15 +09:00
4ae0969c0a Move status check in the memory pool to CPU (#1557) Lianmin Zheng 2024-10-02 18:23:35 -07:00
317631cada [Fix] Move ScheduleBatch out of SamplingInfo (#1556) Lianmin Zheng 2024-10-02 17:18:04 -07:00
b564835364 [Fix] do not maintain regex_fsm in SamplingBatchInfo (#1555) Lianmin Zheng 2024-10-02 13:19:44 -07:00
2c7d0a5b8b [Fix] Fix all the Huggingface paths (#1553) Theresa Barton 2024-10-02 10:12:07 -07:00
8cdc76f6d4 [Performance, Hardware] MoE tuning on AMD MI300x GPUs (#1554) kk 2024-10-03 00:52:46 +08:00
f202ed9712 [Refactor] Simplify io_struct and tokenizer_manager (#1549) Ying Sheng 2024-10-01 10:25:32 -07:00
100f5b8bc9 Simplify flashinfer dispatch (#1552) Liangsheng Yin 2024-10-01 00:28:42 -07:00
619bb6ddda Dispatch flashinfer wrappers (#1550) Liangsheng Yin 2024-09-30 23:12:36 -07:00
b88ea90d4a Fix bugs of logprobs_nums (#1548) Liangsheng Yin 2024-09-30 17:09:54 -07:00
99ec439da4 Organize Attention Backends (#1547) Liangsheng Yin 2024-09-30 15:54:18 -07:00
0f4fb19bc8 [Fix, LoRA] fix LoRA with updates in main (#1545) Ying Sheng 2024-09-30 10:06:08 -07:00
63ba2f8d7b Clean up batch data structures: Introducing ModelWorkerBatch (#1544) Lianmin Zheng 2024-09-30 06:41:49 -07:00
36d5acfca5 Rename InputMetadata -> ForwardBatch (#1543) Lianmin Zheng 2024-09-30 02:41:11 -07:00
3f0fe08d37 Let ModelRunner take InputMetadata as input, instead of ScheduleBatch (#1541) Lianmin Zheng 2024-09-29 20:28:45 -07:00
55b974f96f Process image in parallel (#1539) Liangsheng Yin 2024-09-29 18:52:43 -07:00
f86c1e611f Move scheduler code from tp_worker.py to scheduler.py (#1538) Lianmin Zheng 2024-09-29 17:42:45 -07:00
acaffd233f [Fix] fix ipv6 url when warm up model (#1537) Xinyu Yang 2024-09-30 02:02:40 +08:00
048685430d Improve process creation (#1534) Lianmin Zheng 2024-09-29 02:36:12 -07:00
fd9ad817ec Organize image inputs (#1531) Liangsheng Yin 2024-09-28 23:28:55 -07:00
e165a9fc1b Make detokenizer_manager.py not asyncio (#1532) Lianmin Zheng 2024-09-28 19:33:09 -07:00
4e4459b91f Multiple minor fixes (#1530) Lianmin Zheng 2024-09-28 14:43:35 -07:00
065bb94753 Fix RuntimeEndpoint.select method (#1495) Jeffrey Fong 2024-09-29 05:04:06 +08:00
f42e9bfb52 [bugfix] Add modelscope package to avoid docker image without modelscope (#1520) Kylin 2024-09-29 03:43:22 +08:00
840c5dbcb3 [FIX] Catch syntax error of Regex Guide to avoid crash (#1521) Ninglin Du 2024-09-29 03:42:06 +08:00
63e845d0bb Add float8 dynamic quant to torchao_utils (#1528) Jerry Zhang 2024-09-28 12:27:54 -07:00
9aa6553d2a [Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525) Ying Sheng 2024-09-27 23:32:11 -07:00
b1e330bcb0 [Event] Update meeting link (#1529) Ying Sheng 2024-09-27 13:30:04 -07:00
4353acb469 minor: fix config (#1524) Liangsheng Yin 2024-09-27 01:49:16 -07:00
9ae1db0bdc [Fix] Ignore import error (#1513) Lianmin Zheng 2024-09-25 11:32:21 -07:00
37c5899fc2 Release v0.3.2 (#1512) Ying Sheng 2024-09-24 23:17:09 -07:00
f39a0197fd Revert "kernel: use tensor cores for flashinfer gqa kernels" (#1511) Ying Sheng 2024-09-24 22:50:31 -07:00
3c93187caf Add support for tie_word_embeddings when loading weights + support for SmolLM (#1508) TianyiQ 2024-09-24 21:50:20 -07:00
fb2d0680e0 [Fix] Fix clean_up_tokenization_spaces in tokenizer (#1510) Lianmin Zheng 2024-09-24 21:37:33 -07:00
067d8e16fc Simplify bench_latency.py (#1503) Lianmin Zheng 2024-09-24 17:42:07 -07:00
e6692bf4a5 debug radixcache stack_overflow (#1499) luzengxiangcn 2024-09-24 19:58:01 +08:00
28b4d8e144 Update test_srt_backend.py (#1502) Lianmin Zheng 2024-09-24 03:17:10 -07:00
bc068e9618 [CI] Move AMD test to a separate file (#1500) Lianmin Zheng 2024-09-24 02:06:28 -07:00
8d4ed42ad5 MoE torch compile (#1497) Ke Bao 2024-09-24 16:46:59 +08:00
2854a5ea9f Fix the overhead due to penalizer in bench_latency (#1496) Lianmin Zheng 2024-09-23 07:38:14 -07:00
42a2d82ba7 minor: add mla fp8 test (#1494) Yineng Zhang 2024-09-23 20:40:17 +08:00
e4780cf839 [API, Feature] Support response prefill for openai API (#1490) Ying Sheng 2024-09-22 06:46:17 -07:00
39bb49d156 Update dockerfile to include datamodel_code_generator (#1492) Lianmin Zheng 2024-09-22 04:49:16 -07:00
6f3cf1297e [CI, AMD] Add AMD tests to CI (#1491) Ying Sheng 2024-09-22 04:45:10 -07:00
13f1357ef0 Add a unit test for data parallelism (#1489) Lianmin Zheng 2024-09-22 02:21:05 -07:00
2a99993cd9 Pr fix max workers (#1456) wellhowtosay 2024-09-22 17:20:26 +08:00
167591e864 Better unit tests for adding a new model (#1488) Lianmin Zheng 2024-09-22 01:50:37 -07:00
441c22db8c doc: update backend (#1486) Yineng Zhang 2024-09-21 22:05:12 +08:00
ce636ac441 fix incorrect links in documentation (#1481) Ran Chen 2024-09-21 05:36:23 -07:00
82136eb0b5 chore: bump v0.3.1.post3 (#1483) Yineng Zhang 2024-09-21 11:17:45 +08:00
b8ccaf4d73 Add MLA gsm8k eval (#1484) Ke Bao 2024-09-21 11:16:13 +08:00
a68cb201dd Fix triton head num (#1482) Ke Bao 2024-09-21 10:25:20 +08:00
014982b5e0 Add OLMoE (#1476) Niklas Muennighoff 2024-09-19 19:32:49 -07:00
a6db88626e minor: add quant eval compared with base (#1475) Yineng Zhang 2024-09-20 01:57:19 +08:00
b4408b0d16 feat: update linear deps 1/N (#1305) Yineng Zhang 2024-09-19 20:53:11 +08:00
2cd7e181dd Fix env vars in bench_latency (#1472) Lianmin Zheng 2024-09-19 03:19:26 -07:00
5ce55aee15 Release v0.3.1.post2 (#1470) Lianmin Zheng 2024-09-19 02:03:38 -07:00
2d346a57c2 Fix padding in the cuda graph (#1469) Lianmin Zheng 2024-09-19 01:52:15 -07:00
446ea33277 fix: creat new dict everytime for putting new frame (#1464) Li Bo 2024-09-19 16:31:48 +08:00
8f527e2940 [Event] Add public meeting invite to README (#1458) Ying Sheng 2024-09-18 08:53:22 -07:00
7f24ea95c3 Fuse top_k and top_k in the sampler (#1457) Lianmin Zheng 2024-09-18 04:35:35 -07:00
1acccb364a Fix oom issues with fp8 for llama (#1454) Lianmin Zheng 2024-09-18 03:45:19 -07:00
aa2750beb3 [Bugfix] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1419) (#1453) HAI 2024-09-18 02:01:35 -07:00
5e62a6b706 Add bench_server_latency.py (#1452) Lianmin Zheng 2024-09-18 00:56:06 -07:00
5752f25eef Fixed n>1 causing list index out of range with VLM (#1449) Xiao Yu 2024-09-18 03:46:32 -04:00
7c162fa9c5 Fix schedule bug (#1451) Liangsheng Yin 2024-09-17 22:59:32 -07:00
36078fb247 fix schedule bug (#1450) Liangsheng Yin 2024-09-17 16:33:53 -07:00
b3710d2c93 Fix attention backend (#1448) Ke Bao 2024-09-17 22:07:53 +08:00
c6b6d2e71b Enable MLA by default (#1447) Ke Bao 2024-09-17 19:42:48 +08:00
90a26be31c Release 0.3.1.post1 (#1445) Lianmin Zheng 2024-09-17 01:47:31 -07:00
1f4b5f770d Add OLMoE model (#1444) Jani Monoses 2024-09-17 11:14:53 +03:00
76524b70d1 Fix torch compile for deepseek-v2 (#1442) Ke Bao 2024-09-17 15:52:08 +08:00
3a6e04185b [Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1420) HAI 2024-09-17 00:43:52 -07:00
2fa5cec775 Simplify sampler and its error handling (#1441) Lianmin Zheng 2024-09-16 21:23:31 -07:00
27b557aea7 Clean up model loader (#1440) Lianmin Zheng 2024-09-16 18:16:27 -07:00
93dffd699b Add constrained_json_whitespace_pattern to ServerArgs (#1438) zifeitong 2024-09-16 13:29:18 -07:00
2abe4f1cb6 Revert "[Minor] Raise exception for wrong import (#1409)" (#1432) Ying Sheng 2024-09-15 15:22:32 -07:00
37963394aa [Feature] Support LoRA path renaming and add LoRA serving benchmarks (#1433) Ying Sheng 2024-09-15 12:46:04 -07:00
899cf5c438 Remove deprecated configs (#1431) Lianmin Zheng 2024-09-15 08:52:18 -07:00
e79f6cd73d Release v0.3.1 (#1430) Lianmin Zheng 2024-09-15 07:03:16 -07:00
9ba1f09760 [Fix] Fix logprob and normalized_logprob (#1428) Lianmin Zheng 2024-09-15 06:36:06 -07:00
282681b8a1 Update backend.md (#1429) Lianmin Zheng 2024-09-15 02:55:34 -07:00
58cafe23a7 Add libibverbs-dev to Dockerfile (#1427) William Arnold 2024-09-15 15:40:31 +09:00
9463bc1385 Enable torch.compile for triton backend (#1422) Lianmin Zheng 2024-09-14 15:38:37 -07:00
e3fc4658f4 fix: resolve nightly eval (#1426) Yineng Zhang 2024-09-15 01:07:52 +09:00
33b54e7c40 Add pytorch sampling backend ut (#1425) Ke Bao 2024-09-14 23:15:30 +08:00
30b404ce72 Add torchao quant for mixtral and qwen_moe (#1418) Jerry Zhang 2024-09-13 23:46:55 -07:00
70b6802982 Optimize conflicts between CUDA graph and vocab mask tensors (#1392) Liangsheng Yin 2024-09-13 20:27:53 -07:00
f3d32f888a ci: fix finish (#1414) Yineng Zhang 2024-09-14 00:01:30 +09:00
8779da95d6 Update pr-test.yml (#1412) Lianmin Zheng 2024-09-13 00:37:13 -07:00
ad0ff62a4c Balance test in CI (#1411) Lianmin Zheng 2024-09-12 23:29:44 -07:00
9a903a8784 [Minor] Raise exception for wrong import (#1409) Ying Sheng 2024-09-12 23:02:36 -07:00
68be2f6d3b [CI] Include triton backend and online serving benchmark into CI (#1408) Lianmin Zheng 2024-09-12 21:36:41 -07:00
b912de11b0 Make stop reason a dict instead of str (#1407) Lianmin Zheng 2024-09-12 20:47:31 -07:00
eb02c1618a [Minor, CI] remove lora test from minimal suite (#1406) Ying Sheng 2024-09-12 16:49:50 -07:00

Commit Graph Select branches Hide Pull Requests 0.5.3rc0 v0.5.2 v0.5.2rc1 v0.5.3_dev v0.5.4 v0.5.4_dev v0.5.4_dev_liucong v0.5.4_dev_maxiao Mono Color

Commit Graph

Select branches

Hide Pull Requests

0.5.3rc0

v0.5.2

v0.5.2rc1

v0.5.3_dev

v0.5.4

v0.5.4_dev

v0.5.4_dev_liucong

v0.5.4_dev_maxiao