Hongpeng Guo
|
949b3fbfce
|
[Doc] Update doc of custom logit processor (#3021)
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
|
2025-01-20 16:50:25 -08:00 |
|
Hui Liu
|
da4e8b3892
|
enable kv_scale remap (#3017)
|
2025-01-20 14:40:45 -08:00 |
|
Enrique Shockwave
|
af6c5357d5
|
deepseek v3 and r1 chat template (#3015)
|
2025-01-20 14:40:12 -08:00 |
|
Byron Hsu
|
3ad4cd4915
|
bump router to 0.1.3 (#3020)
|
2025-01-20 14:38:06 -08:00 |
|
Byron Hsu
|
3a8428ecaa
|
[router] Expose worker startup interval (#3019)
|
2025-01-20 14:36:54 -08:00 |
|
Byron Hsu
|
0311ce8e1c
|
[router] Expose worker startup secs & Return error instead of panic for router init (#3016)
|
2025-01-20 12:45:13 -08:00 |
|
Ke Bao
|
5dfcacfcb1
|
Add compile flags for cutlass 3.x (#3013)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2025-01-21 00:04:12 +08:00 |
|
Ke Bao
|
41a0ccd4f1
|
Add clang-format check to sgl-kernel ci (#3012)
|
2025-01-20 23:22:19 +08:00 |
|
Yineng Zhang
|
e94fb7cb10
|
chore: bump v0.4.1.post7 (#3009)
|
2025-01-20 21:50:55 +08:00 |
|
Byron Hsu
|
b5caa22dfb
|
[kernel] port rope cuda kernel to sgl-kernel (#2993)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-01-20 20:58:51 +08:00 |
|
Lianmin Zheng
|
73401fd016
|
Sync distributed package from vllm 0.6.4.post1 (#3010)
|
2025-01-20 04:57:14 -08:00 |
|
Lianmin Zheng
|
89cd923581
|
Roll back to use vllm custom allreduce (#3006)
|
2025-01-20 04:03:15 -08:00 |
|
Lianmin Zheng
|
dc1881326f
|
Fix perf regression on small batch sizes (#3008)
|
2025-01-20 03:39:49 -08:00 |
|
yiakwy-xpu-ml-framework-team
|
10bfce71b3
|
fix moe align blocks benchmark (#3003)
|
2025-01-20 19:33:29 +08:00 |
|
Hongpeng Guo
|
583697cd71
|
[Enhancement] Custom Logit Processor Improvement (#2998)
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
|
2025-01-20 02:00:35 -08:00 |
|
Chayenne
|
2584f6d944
|
Docs: Add Performance Demonstaration for DPA (#3005)
|
2025-01-20 01:00:52 -08:00 |
|
Lianmin Zheng
|
51e87f6f21
|
Skip flaky custom_logit_processor tests (#3004)
|
2025-01-20 00:28:47 -08:00 |
|
Lianmin Zheng
|
09bcbe0123
|
Update TypeBasedDispatcher and balance CI tests (#3001)
|
2025-01-19 23:37:27 -08:00 |
|
Lianmin Zheng
|
03464890e0
|
Separate two entry points: Engine and HTTP server (#2996)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
|
2025-01-19 22:09:24 -08:00 |
|
Yineng Zhang
|
44a9669770
|
keep rotary_embedding only (#2997)
|
2025-01-20 13:21:36 +08:00 |
|
Chaitanya Sri Krishna Lolla
|
1a820e38a2
|
Remove dependency of pynvml on ROCm (#2995)
|
2025-01-20 13:00:35 +08:00 |
|
Chayenne
|
0ffcfdf474
|
Docs: Only use X-Grammar in structed output (#2991)
|
2025-01-19 20:22:47 -08:00 |
|
Lianmin Zheng
|
cd493b5afc
|
Improve metrics, logging, and importing orders (#2992)
|
2025-01-19 18:36:59 -08:00 |
|
Lianmin Zheng
|
61f42b5732
|
Move sgl.Runtime under sglang/lang (#2990)
|
2025-01-19 17:10:29 -08:00 |
|
Hongpeng Guo
|
e403d23757
|
[Feature] Add sampler custom logits processor (#2396)
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
|
2025-01-19 14:46:53 -08:00 |
|
Enrique Shockwave
|
3bcf5ecea7
|
support regex in xgrammar backend (#2983)
|
2025-01-20 04:34:41 +08:00 |
|
Yineng Zhang
|
2c05f81f15
|
fix custom op version compatibility (#2988)
|
2025-01-20 04:21:29 +08:00 |
|
Seungduk Kim
|
d77caa2b75
|
[#2812] Make the decode status dict capcity adjustable by a CLI param (#2839)
|
2025-01-19 11:36:53 -08:00 |
|
giorgiopiatti-dfinity
|
8b6a4486ec
|
fix missing revision arg when loading tokenizer (#2982)
|
2025-01-19 11:36:07 -08:00 |
|
Yineng Zhang
|
a69cb5cff7
|
cleanup unused header in sgl_kernel (#2986)
|
2025-01-20 00:44:49 +08:00 |
|
Yineng Zhang
|
def5c31873
|
docs: update supported_models (#2987)
|
2025-01-20 00:44:30 +08:00 |
|
Yineng Zhang
|
3fc2b62589
|
update docker dev image (#2985)
|
2025-01-19 23:45:39 +08:00 |
|
Yineng Zhang
|
6ada05d0ed
|
feat: check for is_cuda for sgl_kernel import (#2984)
|
2025-01-19 23:33:04 +08:00 |
|
yizhang2077
|
24cafe3177
|
add config to swtich from vllm custom allreduce to sgl_kernel custom allreduce (#2981)
|
2025-01-19 22:30:38 +08:00 |
|
Yineng Zhang
|
5a176c92df
|
fix deepseek v2 with cpu device (#2975)
|
2025-01-19 21:33:27 +08:00 |
|
Byron Hsu
|
4719c1d04a
|
[router] Fix sgl router path for release (#2980)
|
2025-01-19 01:11:06 -08:00 |
|
Byron Hsu
|
ef18b0eda2
|
[router] Allow empty worker list for sglang.launch_router (#2979)
|
2025-01-19 01:05:23 -08:00 |
|
Byron Hsu
|
53cc91e504
|
[devcontainer] Fix mount and GPU & Support rust dev (#2978)
|
2025-01-19 16:34:01 +08:00 |
|
Yineng Zhang
|
d33cbb7e58
|
remove cub and add cccl (#2976)
|
2025-01-19 15:51:27 +08:00 |
|
Lianmin Zheng
|
23196d5254
|
Simplify logits processor (#2974)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-01-18 23:03:49 -08:00 |
|
Lianmin Zheng
|
93b77c8e8a
|
Fix the request loggings to make it fully able to be easily replayed (#2973)
|
2025-01-18 21:45:00 -08:00 |
|
Lianmin Zheng
|
7906d1d298
|
Remove the unused write_with_records (#2972)
|
2025-01-18 20:20:23 -08:00 |
|
fzyzcjy
|
81d27c8e31
|
Refactor to add TypeBasedDispatcher to simplify dispatching (#2958)
|
2025-01-18 20:13:27 -08:00 |
|
Chang Su
|
4d4cdb3fe7
|
Frontend: better error message handling for FINISH_ABORT in scheduler.py (#2956)
|
2025-01-18 19:37:30 -08:00 |
|
Yang Zheng
|
2bd18e2d76
|
Memory pool: Minor optimize to avoid to (#2901)
|
2025-01-18 19:35:12 -08:00 |
|
Xiaoyu Zhang
|
83452dbb4a
|
fix file name spelling mistake and useless variable in minmax-text-01-lightning_attention (#2971)
|
2025-01-18 18:56:13 -08:00 |
|
Mick
|
3d93f84a00
|
[Feature] Support minicpmv v2.6 (#2785)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2025-01-18 14:14:19 -08:00 |
|
Xiaoyu Zhang
|
c2f212d672
|
optimize MiniMax-Text-01 lightning_attn_decode triton (#2966)
|
2025-01-18 23:41:01 +08:00 |
|
Yineng Zhang
|
e2cdc8a5b5
|
upgrade cutlass v3.7.0 (#2967)
|
2025-01-18 23:37:42 +08:00 |
|
Yineng Zhang
|
2add697d7a
|
feat: remove vllm get_rope (#2964)
|
2025-01-18 19:38:01 +08:00 |
|