Commit Graph

  • e73167ade3 Fix maximum recursion depth triggered on exception exit (#4438) Lianmin Zheng 2025-03-14 15:12:26 -07:00
  • 862fe52241 bump v0.0.5.post1 (#4437) Yineng Zhang 2025-03-14 15:00:26 -07:00
  • 61e4433caf Add moe topk softmax templated from vllm (#4302) Qingquan Song 2025-03-14 12:03:33 -07:00
  • 660305c38a [Doc] fix wrong flag in deepseek documentation (#4427) Zhan Lu 2025-03-14 18:30:55 +00:00
  • 642ab418f3 [bug] fix duplicate variable MAX_PIXELS in qwen_vl.py (#4419) Baoyuan Qi 2025-03-14 16:28:25 +08:00
  • 1ce4878d31 feat(remote_model): support variable remote backend for model loader (#3964) wangyu 2025-03-14 15:40:44 +08:00
  • 977d7cd26a cleanup deps 1/n (#4400) Yineng Zhang 2025-03-14 00:00:33 -07:00
  • 0e0ec70200 Hierarchical Caching supports MLA (#4009) Lu Changqi 2025-03-14 11:42:14 +08:00
  • bb37855653 Update CODEOWNERS (#4403) Lianmin Zheng 2025-03-13 17:54:40 -07:00
  • ba80c102f9 bump v0.4.4.post1 (#4402) Yineng Zhang 2025-03-13 17:53:46 -07:00
  • fbdb50501f Hot fix for hicache with new page aligned radixtree (#4397) Zhiqiang Xie 2025-03-13 15:50:49 -07:00
  • f0afaf5289 Add a dummy grok test case (#4399) Lianmin Zheng 2025-03-13 15:29:48 -07:00
  • 85d2365d33 Fix the output of hidden states after HTTP requests (#4269) Qiaolin Yu 2025-03-13 17:54:06 -04:00
  • 5fe79605a8 Fix Llama3.3 tool call support (#4320) Chang Su 2025-03-13 14:01:41 -07:00
  • c6d7f8d370 Add some fused elementwise kernels for grok-1 (#4398) Lianmin Zheng 2025-03-13 13:39:10 -07:00
  • a5a892ffd3 Fix auto merge & add back get_flat_data_by_layer (#4393) Lianmin Zheng 2025-03-13 08:46:25 -07:00
  • 8e66fbecee Improve DP attention (#4390) Lianmin Zheng 2025-03-13 08:23:56 -07:00
  • f141298a3c Update ci_install_dependency.sh to use accelerate 1.4.0 (#4392) Lianmin Zheng 2025-03-13 07:16:11 -07:00
  • 4fea040ca1 Fix a regression introduced by overlapping KV cache writing (#4375) Lianmin Zheng 2025-03-13 03:49:05 -07:00
  • 6aaeb84872 chore: bump v0.4.4 (#4041) Yineng Zhang 2025-03-13 02:49:58 -07:00
  • 3623b6a7f5 upgrade sgl-kernel 0.0.5 (#4381) Yineng Zhang 2025-03-13 02:37:56 -07:00
  • 4ff1264201 Update pyproject.toml Yineng Zhang 2025-03-13 02:16:51 -07:00
  • 2a4cbad8e9 bump 0.0.5 sgl-kernel (#4377) Yineng Zhang 2025-03-13 02:08:35 -07:00
  • 2937387a50 fix accuracy issue (#4376) Yineng Zhang 2025-03-13 02:06:22 -07:00
  • cf721fdece Update grafana.json (#4374) yuhui 2025-03-13 16:31:33 +08:00
  • 45de89719c Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367) Lianmin Zheng 2025-03-12 23:45:52 -07:00
  • 71046fcd71 [XPU][CPU] Enable the native path of DeepSeek (#4086) Meng, Hengyu 2025-03-13 13:26:29 +08:00
  • c76040e31b Support page size > 1 (#4356) Lianmin Zheng 2025-03-12 22:22:39 -07:00
  • 2f6bacee03 [moe] fix: correct the cache size in the last chunk (#3679) Cheng Wan 2025-03-13 01:22:13 -04:00
  • 4014804157 Ensure Usage Data in Streaming Responses Aligns with vLLM’s Implementation (#3814) Wen Sun 2025-03-13 13:12:55 +08:00
  • ad46550d25 [Doc] Fix typo in backend/sampling_params (#3835) yang_zcybb 2025-03-13 13:12:14 +08:00
  • 14344caa38 [docs] Update outdated description about torch.compile (#3844) Jun Liu 2025-03-13 14:09:38 +09:00
  • f7f88b706c HotFix: json serialization error when using OAI v1/batches endpoint with logprobs (#3896) David Carreto Fidalgo 2025-03-13 06:04:29 +01:00
  • 18c27131f5 [tools] add fp8 max/min constant in utils (#3959) yiakwy-xpu-ml-framework-team 2025-03-13 12:44:55 +08:00
  • ccdd10c84b Move aiohttp into public dependencies (#3980) YR Chen 2025-03-13 12:42:57 +08:00
  • 76f6c0ebf9 Add device detection and count functions to utils. (#3962) vikram singh shekhawat 2025-03-13 10:11:50 +05:30
  • 959a3143fc example: add async offline inference demo (#3961) Chitsing KUI 2025-03-13 12:41:21 +08:00
  • 6412c5e493 Avoid duplicated request ids in batch APIs (#4026) Conghui Tan 2025-03-13 12:38:17 +08:00
  • 0c02086015 add INT8 example into dsv3 README (#4079) laixin 2025-03-13 12:37:30 +08:00
  • 85ef7f64e4 [FIX] fix incorrect output when enable both deepgemm and torch compile (#4359) AniZpZ 2025-03-13 12:34:09 +08:00
  • f1cf6eefbe [Fix] Check the device backend before calling empty_cache function (#4212) Chen Shengzhi 2025-03-13 12:28:48 +08:00
  • 0a59a4657a Fix the doc of FR-Spec (#4295) William 2025-03-13 12:22:50 +08:00
  • aff79f101f simple bugfix (#4342) Wang Ran (汪然) 2025-03-13 12:20:18 +08:00
  • 016033188c docs: add parameter --log-requests-level (#4335) Peter Pan 2025-03-13 12:19:37 +08:00
  • 56c39a05a2 Remove the choices in --speculative-eagle-topk argument (#4329) William 2025-03-13 12:19:16 +08:00
  • 4068e01292 Fix per token fp8 quant precision (#4362) Qingquan Song 2025-03-12 21:19:05 -07:00
  • 817d43705c feat: support ep size < 32 for sgl kernel (#4348) Shi Shuai 2025-03-13 11:50:46 +08:00
  • c550e52f8b Fix scheduler proctitle suffix is ​​None (#4326) 文峰 2025-03-13 10:29:35 +08:00
  • e35a93fa8a Move output processing logic from scheduler.py into a separate file (#4354) Lianmin Zheng 2025-03-12 16:21:49 -07:00
  • 2c3656f276 [Fix Doc.] Enable internal forwarding when starting the router (#4355) shizhediao 2025-03-12 15:53:26 -07:00
  • d40ee62b5d Update nightly tests (#4352) Lianmin Zheng 2025-03-12 15:36:13 -07:00
  • 91b19949d7 typo: Update http_server.py (#4350) Wang Ran (汪然) 2025-03-13 06:05:30 +08:00
  • 7c86671131 Support Blackwell Block Scale FP8 Gemm (#4278) Elfie Guo 2025-03-12 14:17:11 -07:00
  • 10b544ae9b Hierarchical Caching Refactoring and Fixing TP issue (#4082) Zhiqiang Xie 2025-03-12 11:22:35 -07:00
  • 01090e8ac3 model: Support Janus-pro (#3203) Mick 2025-03-13 02:02:11 +08:00
  • 6f43a9b9f4 remove the unused readline dependency from the Qwen2 model implementa… (#4340) yych0745 2025-03-12 17:47:27 +08:00
  • 0540fef7a1 [Fix] fix _yarn_linear_ramp_mask with device parameter (#4337) JieXin Liang 2025-03-12 17:28:19 +08:00
  • 481f608b8e Add INT8 support MTP NextN function (#3911) lambert0312 2025-03-12 16:37:16 +08:00
  • ed91561f79 upgrade sgl-kernel 0.0.4.post3 (#4334) Yineng Zhang 2025-03-12 01:36:41 -07:00
  • 6e7239f912 release 0.0.4.post3 sgl-kernel (#4331) Yineng Zhang 2025-03-12 01:05:16 -07:00
  • 0a3960f21f fix awq_dequantize (#4333) Yineng Zhang 2025-03-12 01:04:38 -07:00
  • 07f944631e Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104) Rex 2025-03-12 00:10:02 -07:00
  • e0917e6bd0 Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215) Stefan He 2025-03-12 00:08:03 -07:00
  • 7130a7cea9 refine sgl_moe_align_block_size_benchmark (#4327) Xiaoyu Zhang 2025-03-12 13:48:38 +08:00
  • 8f1f614ee2 [Docs] Clean up benchmark_and_profiling.md (#4297) Michael Yao 2025-03-12 12:48:21 +08:00
  • 7140ba3573 Add A800 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4323) lambert0312 2025-03-12 09:25:56 +08:00
  • d1da58e275 unify is_cuda and is_hip (#4321) Yineng Zhang 2025-03-11 18:12:56 -07:00
  • 1cf63485c1 upgrade flashinfer 0.2.3 (#4317) Yineng Zhang 2025-03-11 15:37:17 -07:00
  • ff2ce0b86f refactor: move image processors to separate files (#4229) Mick 2025-03-12 03:35:35 +08:00
  • 0f2a2e3c19 Add H20 tuning configs support DeepSeek V3/R1 INT8(block-wise) (#4220) Ximingwang-09 2025-03-12 03:32:33 +08:00
  • 690e1f2371 [AMD] Fix rocm sgl-kernel missing modules error (#4311) yigex 2025-03-12 01:35:28 +08:00
  • 00f42707ea update doc (#4299) Yineng Zhang 2025-03-11 01:14:16 -07:00
  • 6a02b32d07 Add A100 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4287) yych0745 2025-03-11 15:49:06 +08:00
  • 3a08f54638 Update MTP doc (#4290) Ke Bao 2025-03-11 15:46:55 +08:00
  • dce303e279 linear support deepgemm (#4199) lukec 2025-03-11 15:38:37 +08:00
  • 4d27eb9ad1 update sgl-kernel 0.0.4.post2 (#4291) Yineng Zhang 2025-03-11 00:34:33 -07:00
  • d3ecd63204 Add A800 tuning configs support DeepSeek V3/R1 BF16 and INT8(block-wise) (#4136) lambert0312 2025-03-11 15:32:25 +08:00
  • cd90945518 bump sgl-kernel 0.0.4.post2 (#4288) Yineng Zhang 2025-03-11 00:09:47 -07:00
  • bde24ab31f update deepgemm (#4284) Yineng Zhang 2025-03-10 23:39:57 -07:00
  • bf2eefc0c7 Uupdate cutalss dependency for its bug fix (#4277) Elfie Guo 2025-03-10 17:00:05 -07:00
  • 5524e7d057 Fix nightly eval for neuralmagic/Mixtral-8x7B-Instruct-v0.1-FP8 (#4279) Lianmin Zheng 2025-03-10 16:50:28 -07:00
  • e187a3d595 upgrade xgrammar 0.1.15 (#4275) Yineng Zhang 2025-03-10 14:53:24 -07:00
  • 3dd4feae63 add THIRDPARTYNOTICES for DeepGEMM (#4272) Yineng Zhang 2025-03-10 11:10:57 -07:00
  • 2ac189edc8 Amd test fp8 (#4261) HandH1998 2025-03-11 01:12:09 +08:00
  • 5a6400eec5 Test no vllm custom allreduce (#4256) Lianmin Zheng 2025-03-10 10:08:25 -07:00
  • cf0ccd406e Optimize rope in sgl kernel (#4267) Lianmin Zheng 2025-03-10 10:07:45 -07:00
  • 3d56585a97 increase the timeout of nightly-test.yml (#4262) Lianmin Zheng 2025-03-10 05:07:03 -07:00
  • 00d25a7f5e Fix quantization and nightly tests (#4258) Lianmin Zheng 2025-03-10 03:06:21 -07:00
  • 1a5023e05d Release sgl-kernel v0.0.4.post1 (#4255) Lianmin Zheng 2025-03-10 02:39:50 -07:00
  • 23308a9032 fix per_token_group_quant_fp8 illegal memory when num_groups % 16 != 0 (#4231) Xiaoyu Zhang 2025-03-10 16:42:58 +08:00
  • ac69885056 fix the input_ids is None error (#4144) shimin 2025-03-10 16:38:37 +08:00
  • aa957102a9 Simplify tests & Fix trtllm custom allreduce registration (#4252) Lianmin Zheng 2025-03-10 01:24:22 -07:00
  • 007f8b3dc2 Added example for multimodal embedding (#4206) simveit 2025-03-10 08:53:56 +01:00
  • 4455b26e76 [Bug fixed] fixed the crash when enable the dp-attention on the single card (#3958) DavidChan 2025-03-10 15:50:34 +08:00
  • c553e1604c DeepGemm integrate to sgl-kernel (#4165) laixin 2025-03-10 15:35:07 +08:00
  • 7c0541b385 Move activation.cu to sgl-kernel/elementwise (#4250) Lianmin Zheng 2025-03-09 22:41:13 -07:00
  • e8a69e4d0c Clean up fp8 support (#4230) Lianmin Zheng 2025-03-09 21:46:35 -07:00
  • fbd560028a Auto balance CI tests (#4238) Lianmin Zheng 2025-03-09 21:05:55 -07:00
  • 730d084f2a Minor style fix for sgl-kernel (#4243) Lianmin Zheng 2025-03-09 20:15:13 -07:00
  • 4a05bdfa86 Revert "Check eagle server args" (#4242) Lianmin Zheng 2025-03-09 18:53:33 -07:00