sglang

EngineX-Hygon/sglang

Fork 0

Commit Graph

Select branches

Hide Pull Requests

0.5.3rc0

v0.5.2

v0.5.2rc1

v0.5.3_dev

v0.5.4

v0.5.4_dev

v0.5.4_dev_liucong

v0.5.4_dev_maxiao

e73167ade3 Fix maximum recursion depth triggered on exception exit (#4438) Lianmin Zheng 2025-03-14 15:12:26 -07:00
862fe52241 bump v0.0.5.post1 (#4437) Yineng Zhang 2025-03-14 15:00:26 -07:00
61e4433caf Add moe topk softmax templated from vllm (#4302) Qingquan Song 2025-03-14 12:03:33 -07:00
660305c38a [Doc] fix wrong flag in deepseek documentation (#4427) Zhan Lu 2025-03-14 18:30:55 +00:00
642ab418f3 [bug] fix duplicate variable MAX_PIXELS in qwen_vl.py (#4419) Baoyuan Qi 2025-03-14 16:28:25 +08:00
1ce4878d31 feat(remote_model): support variable remote backend for model loader (#3964) wangyu 2025-03-14 15:40:44 +08:00
977d7cd26a cleanup deps 1/n (#4400) Yineng Zhang 2025-03-14 00:00:33 -07:00
0e0ec70200 Hierarchical Caching supports MLA (#4009) Lu Changqi 2025-03-14 11:42:14 +08:00
bb37855653 Update CODEOWNERS (#4403) Lianmin Zheng 2025-03-13 17:54:40 -07:00
ba80c102f9 bump v0.4.4.post1 (#4402) Yineng Zhang 2025-03-13 17:53:46 -07:00
fbdb50501f Hot fix for hicache with new page aligned radixtree (#4397) Zhiqiang Xie 2025-03-13 15:50:49 -07:00
f0afaf5289 Add a dummy grok test case (#4399) Lianmin Zheng 2025-03-13 15:29:48 -07:00
85d2365d33 Fix the output of hidden states after HTTP requests (#4269) Qiaolin Yu 2025-03-13 17:54:06 -04:00
5fe79605a8 Fix Llama3.3 tool call support (#4320) Chang Su 2025-03-13 14:01:41 -07:00
c6d7f8d370 Add some fused elementwise kernels for grok-1 (#4398) Lianmin Zheng 2025-03-13 13:39:10 -07:00
a5a892ffd3 Fix auto merge & add back get_flat_data_by_layer (#4393) Lianmin Zheng 2025-03-13 08:46:25 -07:00
8e66fbecee Improve DP attention (#4390) Lianmin Zheng 2025-03-13 08:23:56 -07:00
f141298a3c Update ci_install_dependency.sh to use accelerate 1.4.0 (#4392) Lianmin Zheng 2025-03-13 07:16:11 -07:00
4fea040ca1 Fix a regression introduced by overlapping KV cache writing (#4375) Lianmin Zheng 2025-03-13 03:49:05 -07:00
6aaeb84872 chore: bump v0.4.4 (#4041) Yineng Zhang 2025-03-13 02:49:58 -07:00
3623b6a7f5 upgrade sgl-kernel 0.0.5 (#4381) Yineng Zhang 2025-03-13 02:37:56 -07:00
4ff1264201 Update pyproject.toml Yineng Zhang 2025-03-13 02:16:51 -07:00
2a4cbad8e9 bump 0.0.5 sgl-kernel (#4377) Yineng Zhang 2025-03-13 02:08:35 -07:00
2937387a50 fix accuracy issue (#4376) Yineng Zhang 2025-03-13 02:06:22 -07:00
cf721fdece Update grafana.json (#4374) yuhui 2025-03-13 16:31:33 +08:00
45de89719c Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367) Lianmin Zheng 2025-03-12 23:45:52 -07:00
71046fcd71 [XPU][CPU] Enable the native path of DeepSeek (#4086) Meng, Hengyu 2025-03-13 13:26:29 +08:00
c76040e31b Support page size > 1 (#4356) Lianmin Zheng 2025-03-12 22:22:39 -07:00
2f6bacee03 [moe] fix: correct the cache size in the last chunk (#3679) Cheng Wan 2025-03-13 01:22:13 -04:00
4014804157 Ensure Usage Data in Streaming Responses Aligns with vLLM’s Implementation (#3814) Wen Sun 2025-03-13 13:12:55 +08:00
ad46550d25 [Doc] Fix typo in backend/sampling_params (#3835) yang_zcybb 2025-03-13 13:12:14 +08:00
14344caa38 [docs] Update outdated description about torch.compile (#3844) Jun Liu 2025-03-13 14:09:38 +09:00
f7f88b706c HotFix: json serialization error when using OAI v1/batches endpoint with logprobs (#3896) David Carreto Fidalgo 2025-03-13 06:04:29 +01:00
18c27131f5 [tools] add fp8 max/min constant in utils (#3959) yiakwy-xpu-ml-framework-team 2025-03-13 12:44:55 +08:00
ccdd10c84b Move aiohttp into public dependencies (#3980) YR Chen 2025-03-13 12:42:57 +08:00
76f6c0ebf9 Add device detection and count functions to utils. (#3962) vikram singh shekhawat 2025-03-13 10:11:50 +05:30
959a3143fc example: add async offline inference demo (#3961) Chitsing KUI 2025-03-13 12:41:21 +08:00
6412c5e493 Avoid duplicated request ids in batch APIs (#4026) Conghui Tan 2025-03-13 12:38:17 +08:00
0c02086015 add INT8 example into dsv3 README (#4079) laixin 2025-03-13 12:37:30 +08:00
85ef7f64e4 [FIX] fix incorrect output when enable both deepgemm and torch compile (#4359) AniZpZ 2025-03-13 12:34:09 +08:00
f1cf6eefbe [Fix] Check the device backend before calling empty_cache function (#4212) Chen Shengzhi 2025-03-13 12:28:48 +08:00
0a59a4657a Fix the doc of FR-Spec (#4295) William 2025-03-13 12:22:50 +08:00
aff79f101f simple bugfix (#4342) Wang Ran (汪然) 2025-03-13 12:20:18 +08:00
016033188c docs: add parameter --log-requests-level (#4335) Peter Pan 2025-03-13 12:19:37 +08:00
56c39a05a2 Remove the choices in --speculative-eagle-topk argument (#4329) William 2025-03-13 12:19:16 +08:00
4068e01292 Fix per token fp8 quant precision (#4362) Qingquan Song 2025-03-12 21:19:05 -07:00
817d43705c feat: support ep size < 32 for sgl kernel (#4348) Shi Shuai 2025-03-13 11:50:46 +08:00
c550e52f8b Fix scheduler proctitle suffix is None (#4326) 文峰 2025-03-13 10:29:35 +08:00
e35a93fa8a Move output processing logic from scheduler.py into a separate file (#4354) Lianmin Zheng 2025-03-12 16:21:49 -07:00
2c3656f276 [Fix Doc.] Enable internal forwarding when starting the router (#4355) shizhediao 2025-03-12 15:53:26 -07:00
d40ee62b5d Update nightly tests (#4352) Lianmin Zheng 2025-03-12 15:36:13 -07:00
91b19949d7 typo: Update http_server.py (#4350) Wang Ran (汪然) 2025-03-13 06:05:30 +08:00
7c86671131 Support Blackwell Block Scale FP8 Gemm (#4278) Elfie Guo 2025-03-12 14:17:11 -07:00
10b544ae9b Hierarchical Caching Refactoring and Fixing TP issue (#4082) Zhiqiang Xie 2025-03-12 11:22:35 -07:00
01090e8ac3 model: Support Janus-pro (#3203) Mick 2025-03-13 02:02:11 +08:00
6f43a9b9f4 remove the unused readline dependency from the Qwen2 model implementa… (#4340) yych0745 2025-03-12 17:47:27 +08:00
0540fef7a1 [Fix] fix _yarn_linear_ramp_mask with device parameter (#4337) JieXin Liang 2025-03-12 17:28:19 +08:00
481f608b8e Add INT8 support MTP NextN function (#3911) lambert0312 2025-03-12 16:37:16 +08:00
ed91561f79 upgrade sgl-kernel 0.0.4.post3 (#4334) Yineng Zhang 2025-03-12 01:36:41 -07:00
6e7239f912 release 0.0.4.post3 sgl-kernel (#4331) Yineng Zhang 2025-03-12 01:05:16 -07:00
0a3960f21f fix awq_dequantize (#4333) Yineng Zhang 2025-03-12 01:04:38 -07:00
07f944631e Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104) Rex 2025-03-12 00:10:02 -07:00
e0917e6bd0 Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215) Stefan He 2025-03-12 00:08:03 -07:00
7130a7cea9 refine sgl_moe_align_block_size_benchmark (#4327) Xiaoyu Zhang 2025-03-12 13:48:38 +08:00
8f1f614ee2 [Docs] Clean up benchmark_and_profiling.md (#4297) Michael Yao 2025-03-12 12:48:21 +08:00
7140ba3573 Add A800 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4323) lambert0312 2025-03-12 09:25:56 +08:00
d1da58e275 unify is_cuda and is_hip (#4321) Yineng Zhang 2025-03-11 18:12:56 -07:00
1cf63485c1 upgrade flashinfer 0.2.3 (#4317) Yineng Zhang 2025-03-11 15:37:17 -07:00
ff2ce0b86f refactor: move image processors to separate files (#4229) Mick 2025-03-12 03:35:35 +08:00
0f2a2e3c19 Add H20 tuning configs support DeepSeek V3/R1 INT8(block-wise) (#4220) Ximingwang-09 2025-03-12 03:32:33 +08:00
690e1f2371 [AMD] Fix rocm sgl-kernel missing modules error (#4311) yigex 2025-03-12 01:35:28 +08:00
00f42707ea update doc (#4299) Yineng Zhang 2025-03-11 01:14:16 -07:00
6a02b32d07 Add A100 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4287) yych0745 2025-03-11 15:49:06 +08:00
3a08f54638 Update MTP doc (#4290) Ke Bao 2025-03-11 15:46:55 +08:00
dce303e279 linear support deepgemm (#4199) lukec 2025-03-11 15:38:37 +08:00
4d27eb9ad1 update sgl-kernel 0.0.4.post2 (#4291) Yineng Zhang 2025-03-11 00:34:33 -07:00
d3ecd63204 Add A800 tuning configs support DeepSeek V3/R1 BF16 and INT8(block-wise) (#4136) lambert0312 2025-03-11 15:32:25 +08:00
cd90945518 bump sgl-kernel 0.0.4.post2 (#4288) Yineng Zhang 2025-03-11 00:09:47 -07:00
bde24ab31f update deepgemm (#4284) Yineng Zhang 2025-03-10 23:39:57 -07:00
bf2eefc0c7 Uupdate cutalss dependency for its bug fix (#4277) Elfie Guo 2025-03-10 17:00:05 -07:00
5524e7d057 Fix nightly eval for neuralmagic/Mixtral-8x7B-Instruct-v0.1-FP8 (#4279) Lianmin Zheng 2025-03-10 16:50:28 -07:00
e187a3d595 upgrade xgrammar 0.1.15 (#4275) Yineng Zhang 2025-03-10 14:53:24 -07:00
3dd4feae63 add THIRDPARTYNOTICES for DeepGEMM (#4272) Yineng Zhang 2025-03-10 11:10:57 -07:00
2ac189edc8 Amd test fp8 (#4261) HandH1998 2025-03-11 01:12:09 +08:00
5a6400eec5 Test no vllm custom allreduce (#4256) Lianmin Zheng 2025-03-10 10:08:25 -07:00
cf0ccd406e Optimize rope in sgl kernel (#4267) Lianmin Zheng 2025-03-10 10:07:45 -07:00
3d56585a97 increase the timeout of nightly-test.yml (#4262) Lianmin Zheng 2025-03-10 05:07:03 -07:00
00d25a7f5e Fix quantization and nightly tests (#4258) Lianmin Zheng 2025-03-10 03:06:21 -07:00
1a5023e05d Release sgl-kernel v0.0.4.post1 (#4255) Lianmin Zheng 2025-03-10 02:39:50 -07:00
23308a9032 fix per_token_group_quant_fp8 illegal memory when num_groups % 16 != 0 (#4231) Xiaoyu Zhang 2025-03-10 16:42:58 +08:00
ac69885056 fix the input_ids is None error (#4144) shimin 2025-03-10 16:38:37 +08:00
aa957102a9 Simplify tests & Fix trtllm custom allreduce registration (#4252) Lianmin Zheng 2025-03-10 01:24:22 -07:00
007f8b3dc2 Added example for multimodal embedding (#4206) simveit 2025-03-10 08:53:56 +01:00
4455b26e76 [Bug fixed] fixed the crash when enable the dp-attention on the single card (#3958) DavidChan 2025-03-10 15:50:34 +08:00
c553e1604c DeepGemm integrate to sgl-kernel (#4165) laixin 2025-03-10 15:35:07 +08:00
7c0541b385 Move activation.cu to sgl-kernel/elementwise (#4250) Lianmin Zheng 2025-03-09 22:41:13 -07:00
e8a69e4d0c Clean up fp8 support (#4230) Lianmin Zheng 2025-03-09 21:46:35 -07:00
fbd560028a Auto balance CI tests (#4238) Lianmin Zheng 2025-03-09 21:05:55 -07:00
730d084f2a Minor style fix for sgl-kernel (#4243) Lianmin Zheng 2025-03-09 20:15:13 -07:00
4a05bdfa86 Revert "Check eagle server args" (#4242) Lianmin Zheng 2025-03-09 18:53:33 -07:00

Commit Graph Select branches Hide Pull Requests 0.5.3rc0 v0.5.2 v0.5.2rc1 v0.5.3_dev v0.5.4 v0.5.4_dev v0.5.4_dev_liucong v0.5.4_dev_maxiao Mono Color

Commit Graph

Select branches

Hide Pull Requests

0.5.3rc0

v0.5.2

v0.5.2rc1

v0.5.3_dev

v0.5.4

v0.5.4_dev

v0.5.4_dev_liucong

v0.5.4_dev_maxiao