Commit Graph

1839 Commits

Author SHA1 Message Date
Yineng Zhang
3e032c07cc use v0.6.4.post1 for sgl-kernel ci (#3071) 2025-01-23 14:19:38 +08:00
Yineng Zhang
44e12ce463 docs: update developer guide for sgl-kernel (#3069) 2025-01-23 14:08:25 +08:00
Yineng Zhang
a547aad61f docs: add developer guide for sgl-kernel (#3068) 2025-01-23 13:47:53 +08:00
Lianmin Zheng
ea535dc574 Revert "disable custom allreduce on HIP" (#3067) 2025-01-22 21:33:35 -08:00
Ke Wen
862bcff833 Support loading of larger models with on-the-fly quantization (#3061) 2025-01-22 21:33:17 -08:00
Lianmin Zheng
8b84e69f25 Fix tp token sync for dp attention (#3062) 2025-01-22 18:51:40 -08:00
Byron Hsu
5de50653cd [router] make error actionable (#3063) 2025-01-22 17:56:21 -08:00
Byron Hsu
c0bf9bf15c [devcontainer] add non-root user (#2989) 2025-01-22 17:47:54 -08:00
Lianmin Zheng
022614d26e Add some flags to allow sync token ids across TP ranks (#3060) 2025-01-22 15:05:51 -08:00
lukec
b8ab989ff4 Fix the FP8 E4M3 parsing offline scales failure bug (#3045) 2025-01-22 14:19:33 -08:00
Baizhou Zhang
b3393e941f [Doc] Update doc of profiling with PyTorch Profiler (#3038) 2025-01-22 14:17:26 -08:00
Hui Liu
ddc2001fb0 disable custom allreduce on HIP (#3058) 2025-01-22 13:57:22 -08:00
Yineng Zhang
806a3002c1 add notice about flashinfer in sgl-kernel (#3057) 2025-01-23 02:47:36 +08:00
nstream-ai-devx
0d2148efaa fix rotary_embedding rope_scaling for phi (#3055) 2025-01-23 02:15:32 +08:00
Yineng Zhang
bf669606eb feat: integrate bmm_fp8 kernel into sgl-kernel (#3056) 2025-01-23 00:39:38 +08:00
Yineng Zhang
b2bd8f444c minor: update header and use pytest (#3054) 2025-01-22 23:45:18 +08:00
Yineng Zhang
9d9b482a39 feat: integrate activation kernels into sgl-kernel (#3053) 2025-01-22 23:25:45 +08:00
Yineng Zhang
7353fb9b97 feat: integrate norm kernels into sgl-kernel (#3052) 2025-01-22 21:32:48 +08:00
Yineng Zhang
bcda0c9ee6 sync the upstream updates of flashinfer (#3051) 2025-01-22 20:33:13 +08:00
Yineng Zhang
9f8f2c7f74 update norm cu (#3048) 2025-01-22 18:58:44 +08:00
Ke Bao
6fc37bd8ee Fix sgl-kernel compile for sm80 (#3046) 2025-01-22 16:49:08 +08:00
Lianmin Zheng
3d8f1c9bcf Use int64 as indices for set_kv_buffer (#3039) 2025-01-21 19:46:09 -08:00
Yineng Zhang
a42213dbd4 fix pr-test-sgl-kernel (#3036) 2025-01-22 00:56:42 +08:00
Ke Bao
0ac019f171 Support sm90 Int8 gemm (#3035) 2025-01-21 22:21:54 +08:00
Yineng Zhang
5a0d680a14 feat: add flashinfer as 3rdparty and use rmsnorm as example (#3033) 2025-01-21 20:44:49 +08:00
Lianmin Zheng
a4331cd260 Add accuracy and latency tests of eagle into CI (#3027) 2025-01-21 02:55:14 -08:00
Yineng Zhang
ec1c21cdc4 upgrade torch version for sgl-kernel (#3026) 2025-01-21 14:32:08 +08:00
Yineng Zhang
6c856b4f3a minor: update Makefile for sgl-kernel (#3025) 2025-01-21 13:08:15 +08:00
Lianmin Zheng
287d07a669 Misc fixes for eagle (flush_cache, CPU overhead) (#3014) 2025-01-20 20:27:38 -08:00
Hui Liu
d2571dd5c7 Enable Cohere2 Models (#3018) 2025-01-20 19:21:41 -08:00
996_icu
b730aa6b9e [EAGLE] Fix some boundary situation when retract reqs and req's max token = 1 (#2939)
Co-authored-by: josephyou <josephyou@tencent.com>
2025-01-20 17:46:43 -08:00
Lianmin Zheng
60b2a44a80 Fix flaky tests in test_programs.py (#3022) 2025-01-20 16:50:39 -08:00
Hongpeng Guo
949b3fbfce [Doc] Update doc of custom logit processor (#3021)
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
2025-01-20 16:50:25 -08:00
Hui Liu
da4e8b3892 enable kv_scale remap (#3017) 2025-01-20 14:40:45 -08:00
Enrique Shockwave
af6c5357d5 deepseek v3 and r1 chat template (#3015) 2025-01-20 14:40:12 -08:00
Byron Hsu
3ad4cd4915 bump router to 0.1.3 (#3020) 2025-01-20 14:38:06 -08:00
Byron Hsu
3a8428ecaa [router] Expose worker startup interval (#3019) 2025-01-20 14:36:54 -08:00
Byron Hsu
0311ce8e1c [router] Expose worker startup secs & Return error instead of panic for router init (#3016) 2025-01-20 12:45:13 -08:00
Ke Bao
5dfcacfcb1 Add compile flags for cutlass 3.x (#3013)
Co-authored-by: HandH1998 <1335248067@qq.com>
2025-01-21 00:04:12 +08:00
Ke Bao
41a0ccd4f1 Add clang-format check to sgl-kernel ci (#3012) 2025-01-20 23:22:19 +08:00
Yineng Zhang
e94fb7cb10 chore: bump v0.4.1.post7 (#3009) 2025-01-20 21:50:55 +08:00
Byron Hsu
b5caa22dfb [kernel] port rope cuda kernel to sgl-kernel (#2993)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-01-20 20:58:51 +08:00
Lianmin Zheng
73401fd016 Sync distributed package from vllm 0.6.4.post1 (#3010) 2025-01-20 04:57:14 -08:00
Lianmin Zheng
89cd923581 Roll back to use vllm custom allreduce (#3006) 2025-01-20 04:03:15 -08:00
Lianmin Zheng
dc1881326f Fix perf regression on small batch sizes (#3008) 2025-01-20 03:39:49 -08:00
yiakwy-xpu-ml-framework-team
10bfce71b3 fix moe align blocks benchmark (#3003) 2025-01-20 19:33:29 +08:00
Hongpeng Guo
583697cd71 [Enhancement] Custom Logit Processor Improvement (#2998)
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
2025-01-20 02:00:35 -08:00
Chayenne
2584f6d944 Docs: Add Performance Demonstaration for DPA (#3005) 2025-01-20 01:00:52 -08:00
Lianmin Zheng
51e87f6f21 Skip flaky custom_logit_processor tests (#3004) 2025-01-20 00:28:47 -08:00
Lianmin Zheng
09bcbe0123 Update TypeBasedDispatcher and balance CI tests (#3001) 2025-01-19 23:37:27 -08:00