lukec
|
b8ab989ff4
|
Fix the FP8 E4M3 parsing offline scales failure bug (#3045)
|
2025-01-22 14:19:33 -08:00 |
|
Baizhou Zhang
|
b3393e941f
|
[Doc] Update doc of profiling with PyTorch Profiler (#3038)
|
2025-01-22 14:17:26 -08:00 |
|
Hui Liu
|
ddc2001fb0
|
disable custom allreduce on HIP (#3058)
|
2025-01-22 13:57:22 -08:00 |
|
Yineng Zhang
|
806a3002c1
|
add notice about flashinfer in sgl-kernel (#3057)
|
2025-01-23 02:47:36 +08:00 |
|
nstream-ai-devx
|
0d2148efaa
|
fix rotary_embedding rope_scaling for phi (#3055)
|
2025-01-23 02:15:32 +08:00 |
|
Yineng Zhang
|
bf669606eb
|
feat: integrate bmm_fp8 kernel into sgl-kernel (#3056)
|
2025-01-23 00:39:38 +08:00 |
|
Yineng Zhang
|
b2bd8f444c
|
minor: update header and use pytest (#3054)
|
2025-01-22 23:45:18 +08:00 |
|
Yineng Zhang
|
9d9b482a39
|
feat: integrate activation kernels into sgl-kernel (#3053)
|
2025-01-22 23:25:45 +08:00 |
|
Yineng Zhang
|
7353fb9b97
|
feat: integrate norm kernels into sgl-kernel (#3052)
|
2025-01-22 21:32:48 +08:00 |
|
Yineng Zhang
|
bcda0c9ee6
|
sync the upstream updates of flashinfer (#3051)
|
2025-01-22 20:33:13 +08:00 |
|
Yineng Zhang
|
9f8f2c7f74
|
update norm cu (#3048)
|
2025-01-22 18:58:44 +08:00 |
|
Ke Bao
|
6fc37bd8ee
|
Fix sgl-kernel compile for sm80 (#3046)
|
2025-01-22 16:49:08 +08:00 |
|
Lianmin Zheng
|
3d8f1c9bcf
|
Use int64 as indices for set_kv_buffer (#3039)
|
2025-01-21 19:46:09 -08:00 |
|
Yineng Zhang
|
a42213dbd4
|
fix pr-test-sgl-kernel (#3036)
|
2025-01-22 00:56:42 +08:00 |
|
Ke Bao
|
0ac019f171
|
Support sm90 Int8 gemm (#3035)
|
2025-01-21 22:21:54 +08:00 |
|
Yineng Zhang
|
5a0d680a14
|
feat: add flashinfer as 3rdparty and use rmsnorm as example (#3033)
|
2025-01-21 20:44:49 +08:00 |
|
Lianmin Zheng
|
a4331cd260
|
Add accuracy and latency tests of eagle into CI (#3027)
|
2025-01-21 02:55:14 -08:00 |
|
Yineng Zhang
|
ec1c21cdc4
|
upgrade torch version for sgl-kernel (#3026)
|
2025-01-21 14:32:08 +08:00 |
|
Yineng Zhang
|
6c856b4f3a
|
minor: update Makefile for sgl-kernel (#3025)
|
2025-01-21 13:08:15 +08:00 |
|
Lianmin Zheng
|
287d07a669
|
Misc fixes for eagle (flush_cache, CPU overhead) (#3014)
|
2025-01-20 20:27:38 -08:00 |
|
Hui Liu
|
d2571dd5c7
|
Enable Cohere2 Models (#3018)
|
2025-01-20 19:21:41 -08:00 |
|
996_icu
|
b730aa6b9e
|
[EAGLE] Fix some boundary situation when retract reqs and req's max token = 1 (#2939)
Co-authored-by: josephyou <josephyou@tencent.com>
|
2025-01-20 17:46:43 -08:00 |
|
Lianmin Zheng
|
60b2a44a80
|
Fix flaky tests in test_programs.py (#3022)
|
2025-01-20 16:50:39 -08:00 |
|
Hongpeng Guo
|
949b3fbfce
|
[Doc] Update doc of custom logit processor (#3021)
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
|
2025-01-20 16:50:25 -08:00 |
|
Hui Liu
|
da4e8b3892
|
enable kv_scale remap (#3017)
|
2025-01-20 14:40:45 -08:00 |
|
Enrique Shockwave
|
af6c5357d5
|
deepseek v3 and r1 chat template (#3015)
|
2025-01-20 14:40:12 -08:00 |
|
Byron Hsu
|
3ad4cd4915
|
bump router to 0.1.3 (#3020)
|
2025-01-20 14:38:06 -08:00 |
|
Byron Hsu
|
3a8428ecaa
|
[router] Expose worker startup interval (#3019)
|
2025-01-20 14:36:54 -08:00 |
|
Byron Hsu
|
0311ce8e1c
|
[router] Expose worker startup secs & Return error instead of panic for router init (#3016)
|
2025-01-20 12:45:13 -08:00 |
|
Ke Bao
|
5dfcacfcb1
|
Add compile flags for cutlass 3.x (#3013)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2025-01-21 00:04:12 +08:00 |
|
Ke Bao
|
41a0ccd4f1
|
Add clang-format check to sgl-kernel ci (#3012)
|
2025-01-20 23:22:19 +08:00 |
|
Yineng Zhang
|
e94fb7cb10
|
chore: bump v0.4.1.post7 (#3009)
|
2025-01-20 21:50:55 +08:00 |
|
Byron Hsu
|
b5caa22dfb
|
[kernel] port rope cuda kernel to sgl-kernel (#2993)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-01-20 20:58:51 +08:00 |
|
Lianmin Zheng
|
73401fd016
|
Sync distributed package from vllm 0.6.4.post1 (#3010)
|
2025-01-20 04:57:14 -08:00 |
|
Lianmin Zheng
|
89cd923581
|
Roll back to use vllm custom allreduce (#3006)
|
2025-01-20 04:03:15 -08:00 |
|
Lianmin Zheng
|
dc1881326f
|
Fix perf regression on small batch sizes (#3008)
|
2025-01-20 03:39:49 -08:00 |
|
yiakwy-xpu-ml-framework-team
|
10bfce71b3
|
fix moe align blocks benchmark (#3003)
|
2025-01-20 19:33:29 +08:00 |
|
Hongpeng Guo
|
583697cd71
|
[Enhancement] Custom Logit Processor Improvement (#2998)
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
|
2025-01-20 02:00:35 -08:00 |
|
Chayenne
|
2584f6d944
|
Docs: Add Performance Demonstaration for DPA (#3005)
|
2025-01-20 01:00:52 -08:00 |
|
Lianmin Zheng
|
51e87f6f21
|
Skip flaky custom_logit_processor tests (#3004)
|
2025-01-20 00:28:47 -08:00 |
|
Lianmin Zheng
|
09bcbe0123
|
Update TypeBasedDispatcher and balance CI tests (#3001)
|
2025-01-19 23:37:27 -08:00 |
|
Lianmin Zheng
|
03464890e0
|
Separate two entry points: Engine and HTTP server (#2996)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
|
2025-01-19 22:09:24 -08:00 |
|
Yineng Zhang
|
44a9669770
|
keep rotary_embedding only (#2997)
|
2025-01-20 13:21:36 +08:00 |
|
Chaitanya Sri Krishna Lolla
|
1a820e38a2
|
Remove dependency of pynvml on ROCm (#2995)
|
2025-01-20 13:00:35 +08:00 |
|
Chayenne
|
0ffcfdf474
|
Docs: Only use X-Grammar in structed output (#2991)
|
2025-01-19 20:22:47 -08:00 |
|
Lianmin Zheng
|
cd493b5afc
|
Improve metrics, logging, and importing orders (#2992)
|
2025-01-19 18:36:59 -08:00 |
|
Lianmin Zheng
|
61f42b5732
|
Move sgl.Runtime under sglang/lang (#2990)
|
2025-01-19 17:10:29 -08:00 |
|
Hongpeng Guo
|
e403d23757
|
[Feature] Add sampler custom logits processor (#2396)
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
|
2025-01-19 14:46:53 -08:00 |
|
Enrique Shockwave
|
3bcf5ecea7
|
support regex in xgrammar backend (#2983)
|
2025-01-20 04:34:41 +08:00 |
|
Yineng Zhang
|
2c05f81f15
|
fix custom op version compatibility (#2988)
|
2025-01-20 04:21:29 +08:00 |
|