sglang

EngineX-Hygon/sglang

Fork 0

Commit Graph

Select branches

Hide Pull Requests

0.5.3rc0

v0.5.2

v0.5.2rc1

v0.5.3_dev

v0.5.4

v0.5.4_dev

v0.5.4_dev_liucong

v0.5.4_dev_maxiao

0a765bbccc Docs: Refactor Contribution Guide (#2690) Shi Shuai 2024-12-31 22:11:00 +00:00
286cad3ee3 h200 tuning fused_moe_triton config for Mixtral 8x7B/8x22B and Qwen2 57BA14B (#2689) Xiaoyu Zhang 2024-12-31 23:17:36 +08:00
dc7eb01f19 [Fix] fix openai adapter (#2685) Ying Sheng 2024-12-31 02:48:19 -08:00
b0524c3789 Eagle speculative decoding part 2: Fix cuda graph + DP attention hanging (#2684) Lianmin Zheng 2024-12-31 02:25:05 -08:00
6c42fa229d Update README.md (#2683) Lianmin Zheng 2024-12-31 00:13:10 -08:00
d49b13c6f8 feat: use CUDA 12.4 by default (for FA3) (#2682) Yineng Zhang 2024-12-31 15:52:09 +08:00
bedc4c7a50 misc: update CODEOWNERS (#2680) Yineng Zhang 2024-12-31 15:04:50 +08:00
f44d143949 Support target model verification in the attention backend (#2678) Lianmin Zheng 2024-12-30 22:58:55 -08:00
b6b57fc200 minor: cleanup sgl-kernel (#2679) Yineng Zhang 2024-12-31 14:52:00 +08:00
b4403985d0 Add cutlass submodule for sgl-kernel (#2676) Ke Bao 2024-12-31 14:28:29 +08:00
339c69a243 Improve the computation for time_per_output_token Prometheus metrics (#2674) Lianmin Zheng 2024-12-30 21:40:14 -08:00
f707470019 CI: Update scripts to fail fast (#2672) fzyzcjy 2024-12-31 11:04:01 +08:00
21ec66e59e Minor follow-up fixes for the logprob refactor (#2670) Lianmin Zheng 2024-12-30 05:42:08 -08:00
c5210dfa38 AMD DeepSeek_V3 FP8 Numerical fix (#2667) HAI 2024-12-30 05:31:12 -08:00
a29dd9501d Add GemLite caching after each capture (#2669) mobicham 2024-12-30 14:27:29 +01:00
9c6ba2484f Refactor logprob computation to return the real logprob used in sampling (#2664) Lianmin Zheng 2024-12-30 04:51:38 -08:00
b02da24a5b Refactor sgl-kernel build (#2642) Ke Bao 2024-12-30 18:07:01 +08:00
bdd2827a80 Update structured_outputs.ipynb (#2666) Lianmin Zheng 2024-12-30 00:46:41 -08:00
8c3b420eec [Docs] clean up structured outputs docs (#2654) Lianmin Zheng 2024-12-29 23:57:16 -08:00
e6f523b5f2 fix typo in python/sglang/srt/layers/quantization/fp8.py (#2655) HAI 2024-12-29 23:45:02 -08:00
3231817861 Revert "[feat] Add math eval to CI" (#2656) Lianmin Zheng 2024-12-29 23:05:50 -08:00
a11f8d5f6a [feat] Add math eval to CI (#2652) Xiaotong Jiang 2024-12-29 22:49:41 -08:00
098d659c0e docs: update README (#2651) Yineng Zhang 2024-12-30 13:33:29 +08:00
76d14f8cb9 add 2*h20 node serving example for deepseek v3 (#2650) Lzhang-hub 2024-12-30 13:04:38 +08:00
b08c308ebc Update the timeout in nightly-test.yml (#2649) Lianmin Zheng 2024-12-29 14:51:07 -08:00
03d5fbfd44 Release 0.4.1.post3 - upload the config.json to PyPI (#2647) Lianmin Zheng 2024-12-29 14:25:53 -08:00
1703d766d8 CI: skip special token for engine token ids unit test (#2648) Chayenne 2024-12-29 13:52:50 -08:00
09e6e2aa33 Merge branch 'main' of github.com:sgl-project/sglang zhaochenyang20 2024-12-29 21:48:21 +00:00
fad29f7f52 CI: Fix unittest for engine input token ids and output token ids (#2646) Shi Shuai 2024-12-29 21:28:59 +00:00
35bdb48557 [Feature] Get Token IDs with Engine.generate() (#2636) Shi Shuai 2024-12-29 20:28:27 +00:00
b085e06b01 docs: add development guide using docker (#2645) Yineng Zhang 2024-12-30 02:22:54 +08:00
763dd55d17 docs: update README (#2644) Yineng Zhang 2024-12-30 01:24:06 +08:00
3ccf566b0d chore: bump v0.4.1.post2 (#2643) Yineng Zhang 2024-12-30 00:11:46 +08:00
afa0341e57 Update Triton configs for block fp8 kernels (#2641) HandH1998 2024-12-29 22:53:47 +08:00
30828e7192 AMD: set weights and scaling numbers properly for block FP8 (#2637) HAI 2024-12-29 03:23:39 -08:00
e0e09fceeb [Session] Update session control interface (#2635) Ying Sheng 2024-12-29 02:10:27 -08:00
9c05c6898e Add llama_eagle.py (#2640) Lianmin Zheng 2024-12-29 01:45:35 -08:00
3464e57b62 minor: add nsys cli for docker dev (#2639) Yineng Zhang 2024-12-29 17:28:11 +08:00
3815b23ccb Clean up wrapper in flashinfer backend (#2638) Lianmin Zheng 2024-12-29 00:45:57 -08:00
fd34f2da35 [Docs] Add EBNF to sampling params docs (#2609) Adarsh Shirawalmath 2024-12-29 13:35:00 +05:30
8ee9a8501a [Feature] Function Calling (#2544) Tanjiro 2024-12-29 11:28:52 +05:30
fd28640dc5 Add update_weights_from_tensor (#2631) fzyzcjy 2024-12-29 05:30:27 +08:00
7863e4368a add configs for block fp8 related kernels (#2628) Yineng Zhang 2024-12-28 23:12:04 +08:00
333e3bfde5 [docs]Refactor constrained decoding tutorial (#2633) Shi Shuai 2024-12-28 15:00:38 +00:00
239c9d4d3a Docs: Add constrained decoding tutorial (#2614) Shi Shuai 2024-12-28 07:54:28 +00:00
855d0ba381 [CI] Fix nightly test and raise better error message (#2626) Lianmin Zheng 2024-12-27 22:16:39 -08:00
9254a33ad4 avoid fused_moe_triton padding circular import (#2624) Xiaoyu Zhang 2024-12-28 14:01:35 +08:00
8a2681e26a Update readme (#2625) Ke Bao 2024-12-28 13:39:56 +08:00
5276a675f5 Add more supporting organizations (#2623) Lianmin Zheng 2024-12-27 13:41:41 -08:00
751e5ca273 [minor] clean up docs and eos id (#2622) Lianmin Zheng 2024-12-27 11:23:46 -08:00
7a7ac6bea1 [FIX] Update EOS from config (#2475) Yang Zheng 2024-12-28 02:59:56 +08:00
d9e6ee382b docs: update README (#2618) Yineng Zhang 2024-12-28 00:21:53 +08:00
ef5b0ff90b chore: bump v0.4.1.post1 (#2616) Yineng Zhang 2024-12-28 00:11:06 +08:00
6e5305158c update sgl_moe_align_block_size usage (#2617) HandH1998 2024-12-28 00:01:13 +08:00
77d1210b36 fix moe_align_block_size (#2615) HandH1998 2024-12-27 23:32:53 +08:00
70dc2fbe2d Change extend attention kernel launch parameter for ROCm platform to … (#2610) kk 2024-12-27 16:32:17 +08:00
b438a2e512 Fix triton kernel performance regression (#2611) kk 2024-12-27 15:54:38 +08:00
7ca751ff7d Fused moe triton cfg opt for rocm (#2612) kk 2024-12-27 15:38:22 +08:00
c75adfec59 Update CODEOWNERS (#2608) Lianmin Zheng 2024-12-26 20:58:08 -08:00
7722c11c1d Regression fix to AMD/ROCm from recent change (#2606) HAI 2024-12-26 20:22:14 -08:00
b2ed5c8ea7 Tiny code cleanup in tokenizer_manager.py (#2586) fzyzcjy 2024-12-27 09:53:09 +08:00
f46f394f4d Update README.md (#2605) Lianmin Zheng 2024-12-26 10:58:49 -08:00
2125898af5 Update contributor_guide.md (#2603) Lianmin Zheng 2024-12-26 08:36:13 -08:00
44f011d224 Super tiny typo fix (#2564) fzyzcjy 2024-12-27 00:28:01 +08:00
ed91e003bb [UTILS] improve makefile a bit by adding help info (#2570) kzhou003 2024-12-26 08:24:18 -08:00
531d6ea968 fix: package data missing (#2521) yudian0504 2024-12-27 00:16:48 +08:00
dc3bee4815 Fix test and benchmark scripts (#2598) Lianmin Zheng 2024-12-26 07:56:26 -08:00
a74d194146 [unittest] add unit test to test quant args of srt engine (#2574) Zhizhou Sha 2024-12-26 06:54:43 -08:00
3169e66c23 Fix duplicated handling of GetWeightsByNameReqInput (#2565) fzyzcjy 2024-12-26 22:49:32 +08:00
773951548d Fix logprob_start_len for multi modal models (#2597) Lianmin Zheng 2024-12-26 06:27:45 -08:00
637de9e8ce update readme of DeepSeek V3 (#2596) fsygd 2024-12-26 21:31:56 +08:00
acb340728c [Feature] Support new parameter - EBNF in xgrammar (#2526) Adarsh Shirawalmath 2024-12-26 18:42:41 +05:30
08effbff35 Error occurs when loading the gemma model in bitsandbytes format. (#2557) Sangchun Ha (Patrick) 2024-12-26 22:10:37 +09:00
60bd32723a Update README.md (#2594) Lianmin Zheng 2024-12-26 03:31:50 -08:00
e7ebecf82e Fix cache hit rate when chunked prefill (#2555) Liangsheng Yin 2024-12-26 03:14:28 -08:00
9a23c48456 h100 tuning fused_moe_triton for qwen2 moe (#2560) Xiaoyu Zhang 2024-12-26 19:13:31 +08:00
635a042623 docs: update deepseek v3 example (#2592) Yineng Zhang 2024-12-26 17:43:37 +08:00
2dccecf432 fix: only enable moe_align_block_size for now (#2590) Yineng Zhang 2024-12-26 16:56:59 +08:00
75ad0a143f docs: add deepseek v3 launch instructions (#2589) Yineng Zhang 2024-12-26 15:26:54 +08:00
efc52f85e2 chore: bump v0.4.1 (#2582) Yineng Zhang 2024-12-26 07:14:51 +08:00
60e2fdcf4f use sgl-kernel moe_align_block_size (#2581) Yineng Zhang 2024-12-26 06:29:08 +08:00
d7c0e872b0 chore: bump 0.0.2.post8 for sgl-kernel (#2580) Yineng Zhang 2024-12-26 06:11:39 +08:00
31548116a8 fix moe_align_block_size_kernel for shared memory issue (#2579) Yineng Zhang 2024-12-26 05:31:04 +08:00
53aed988cb Refactor MoE (#2575) HandH1998 2024-12-26 00:02:14 +08:00
8a56b43175 [Bench] Flush cache before benchmarking (#2566) Ying Sheng 2024-12-23 19:21:21 -08:00
e835a50021 Reorg moe code (#2563) Ke Bao 2024-12-24 01:10:22 +08:00
23e5e50fd5 Fix gemlite import (#2553) Lianmin Zheng 2024-12-22 20:21:17 -08:00
25e5d589e3 Doc: Update Grammar Backend (#2545) Shi Shuai 2024-12-23 01:14:40 +00:00
41b1db69b8 A better aio rwlock that guarantees the order (#2547) Lianmin Zheng 2024-12-22 15:44:32 -08:00
8496701934 [Misc] Fix metrics, weight update lock, request logging (#2543) Lianmin Zheng 2024-12-22 06:25:57 -08:00
7d672d277b [kernel optimize] benchmark write_req_to_token_pool_triton and optimize kernel (#2509) Xiaoyu Zhang 2024-12-22 18:31:02 +08:00
d4b174817d docs: update sponsorship (DataCrunch) (#2523) Yineng Zhang 2024-12-22 18:29:04 +08:00
19ba2b0ea9 Add lora_paths to v1_chat_generate_request (#2529) Lei 2024-12-22 02:23:33 -08:00
4e1e3cff20 fix #2528 (#2541) Yineng Zhang 2024-12-22 00:14:41 +08:00
8f4d04e540 chore: bump v0.4.0.post2 (#2525) Yineng Zhang 2024-12-21 21:16:34 +08:00
feb2b768ba Add integration with gemlite weight only quant (#2528) Jerry Zhang 2024-12-20 08:25:25 -08:00
d95a5f5bf5 fix followup #2517 (#2524) Yineng Zhang 2024-12-19 23:24:30 +08:00
4b83db24f1 fix: continue to use flashinfer 0.1.6 temporarily (#2517) Yineng Zhang 2024-12-19 14:03:24 +08:00
64456cf023 docs: update README (#2516) Yineng Zhang 2024-12-19 13:44:02 +08:00
bb4a922023 feat: add llama3 eval (#2515) Yineng Zhang 2024-12-19 13:37:09 +08:00

Commit Graph Select branches Hide Pull Requests 0.5.3rc0 v0.5.2 v0.5.2rc1 v0.5.3_dev v0.5.4 v0.5.4_dev v0.5.4_dev_liucong v0.5.4_dev_maxiao Mono Color

Commit Graph

Select branches

Hide Pull Requests

0.5.3rc0

v0.5.2

v0.5.2rc1

v0.5.3_dev

v0.5.4

v0.5.4_dev

v0.5.4_dev_liucong

v0.5.4_dev_maxiao