Mick
|
9d02bb3e2a
|
Urgent model support: support gemma-3-it (#4424)
|
2025-03-16 17:37:32 -07:00 |
|
JieXin Liang
|
1a3fa75f2f
|
[Fix] use torch.cat instead of torch.concat to prevent entering the Autograd backends. (#4466)
|
2025-03-16 00:02:47 -07:00 |
|
Yineng Zhang
|
65b7c9b78f
|
cleanup deps 2/n (#4464)
|
2025-03-15 23:06:17 -07:00 |
|
Michael Feil
|
1fd0cf8a7b
|
Update comment in qwen2.py (#4447)
|
2025-03-15 21:14:29 -07:00 |
|
Lianmin Zheng
|
c6d7f8d370
|
Add some fused elementwise kernels for grok-1 (#4398)
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2025-03-13 13:39:10 -07:00 |
|
Lianmin Zheng
|
8e66fbecee
|
Improve DP attention (#4390)
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-03-13 08:23:56 -07:00 |
|
Lianmin Zheng
|
45de89719c
|
Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367)
|
2025-03-12 23:45:52 -07:00 |
|
Meng, Hengyu
|
71046fcd71
|
[XPU][CPU] Enable the native path of DeepSeek (#4086)
Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>
|
2025-03-12 22:26:29 -07:00 |
|
Mick
|
01090e8ac3
|
model: Support Janus-pro (#3203)
|
2025-03-12 11:02:11 -07:00 |
|
yych0745
|
6f43a9b9f4
|
remove the unused readline dependency from the Qwen2 model implementa… (#4340)
|
2025-03-12 02:47:27 -07:00 |
|
lambert0312
|
481f608b8e
|
Add INT8 support MTP NextN function (#3911)
|
2025-03-12 01:37:16 -07:00 |
|
Yineng Zhang
|
d1da58e275
|
unify is_cuda and is_hip (#4321)
|
2025-03-11 18:12:56 -07:00 |
|
Mick
|
ff2ce0b86f
|
refactor: move image processors to separate files (#4229)
|
2025-03-11 12:35:35 -07:00 |
|
shimin
|
ac69885056
|
fix the input_ids is None error (#4144)
|
2025-03-10 01:38:37 -07:00 |
|
DavidChan
|
4455b26e76
|
[Bug fixed] fixed the crash when enable the dp-attention on the single card (#3958)
|
2025-03-10 00:50:34 -07:00 |
|
Baizhou Zhang
|
9fb48f951f
|
Support nextn for flashinfer mla attention backend (#4218)
|
2025-03-09 00:01:54 -08:00 |
|
HandH1998
|
c7f254468f
|
[Feature] DeepSeek V3/R1 INT8 Quantization (channel-wise) (#3888)
Co-authored-by: yych0745 <1398089567@qq.com>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: b0urnee <2769086541@qq.com>
|
2025-03-06 20:54:52 -08:00 |
|
Qubitium-ModelCloud
|
56a724eba3
|
[QUANT] Add GPTQModel Dynamic Quantization + lm_head Quantization (#3790)
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
|
2025-03-05 01:11:00 -08:00 |
|
Lianmin Zheng
|
e074d84e5b
|
[Minor] more code cleanup (#4077)
|
2025-03-04 21:23:47 -08:00 |
|
Xiuyu Li
|
9545bfb28a
|
fix: support gelu_new activation function in gpt2 (#3712)
|
2025-03-04 04:09:52 -08:00 |
|
Ke Bao
|
9fafa62db7
|
Share target model embed and head weights for nextn (#4033)
|
2025-03-03 13:30:04 -08:00 |
|
Lianmin Zheng
|
1a8f995c46
|
remove cache configs in model definitions (#4031)
|
2025-03-03 05:00:50 -08:00 |
|
Zhousx
|
7fbab730bd
|
[feat] add small vocab table for eagle's draft model[1]. (#3822)
Co-authored-by: Achazwl <323163497@qq.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-03-02 18:58:45 -08:00 |
|
Baizhou Zhang
|
90a4b7d98a
|
[Feature]Support ragged prefill in flashinfer mla backend (#3967)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
|
2025-02-28 18:13:56 -08:00 |
|
fzyzcjy
|
e3e0bc50a9
|
[Feature] SPMD for SGLang + Verl (#3852)
|
2025-02-28 09:53:10 -08:00 |
|
Nicolas Castet
|
127998cc41
|
Fix allgather ops inside cuda graphs (#3709)
|
2025-02-25 08:39:10 -08:00 |
|
Yueyang Pan
|
7036d6fc67
|
[Bug]: Add missing clamp to llavavid (#3787)
|
2025-02-24 19:10:15 -08:00 |
|
Chaitanya Sri Krishna Lolla
|
6ce9dbe828
|
[ROCm] Enable Fused MLA Triton kernel for DeepSeekV3 (#3237)
Co-authored-by: HAI <hixiao@gmail.com>
|
2025-02-24 18:14:31 -08:00 |
|
laixin
|
1a6e97577a
|
Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3730)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2025-02-24 05:43:35 -08:00 |
|
Baizhou Zhang
|
b110084654
|
Refactor flashinfer logic for deepseek v3 and fix accuracy bug (#3785)
|
2025-02-24 04:07:25 -08:00 |
|
fzyzcjy
|
a3339d8cac
|
Bug: Fix weight loader error when LM head weights are tied (#3766)
|
2025-02-21 17:53:12 -08:00 |
|
Chayenne
|
14d90617b0
|
Bug: fix lm head weights in Qwen models (#3777)
|
2025-02-21 16:49:31 -08:00 |
|
fzyzcjy
|
d37f95511d
|
Improve: Tiny fix Olmo2 (#3348)
|
2025-02-21 16:09:35 -08:00 |
|
Zhiyu
|
c66b2c9cf1
|
Add support for nvidia modelopt fp8 kv cache (#3223)
|
2025-02-22 07:04:58 +08:00 |
|
simveit
|
20b765a26e
|
Model: Support Qwen 72B RM model. (#3772)
|
2025-02-21 14:38:21 -08:00 |
|
Mick
|
424848d26f
|
fix: remove dependency on latest transformers impl (#3635)
|
2025-02-19 01:14:11 +08:00 |
|
Yineng Zhang
|
714f3e6362
|
feat: support flashinfer mla with prefix cache (#3643)
|
2025-02-18 02:06:43 +08:00 |
|
Mick
|
bcc213df61
|
Model: Support Qwen 2.5 vl (#3258)
|
2025-02-16 00:58:53 -08:00 |
|
Mick
|
7711ac6ed0
|
doc: emphasize and notify the usage of chat_template (#3589)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-15 00:10:32 -08:00 |
|
Ke Bao
|
862dd76c76
|
Support NextN (MTP) speculative decoding for DeepSeek-V3/R1 (#3582)
|
2025-02-15 05:28:34 +08:00 |
|
Yineng Zhang
|
70f894b810
|
feat: support flashinfer mla attention for deepseek v3 (#3550)
|
2025-02-14 08:50:14 +08:00 |
|
Liangsheng Yin
|
8616357a97
|
Fix deepseek awq v3 (#3450)
|
2025-02-12 22:09:52 +08:00 |
|
Chayenne
|
40022d075a
|
Feature: Fix the binding error in Llama (#3355)
|
2025-02-06 20:19:24 -08:00 |
|
Yineng Zhang
|
8db776f049
|
support QuickGELU (#3250)
|
2025-02-01 19:31:47 +08:00 |
|
Mick
|
9f635ea50d
|
[Fix] Address remaining issues of supporting MiniCPMV (#2977)
|
2025-01-28 00:22:13 -08:00 |
|
Yineng Zhang
|
2f79f58873
|
feat: use sgl-kernel 0.0.3 in sglang (#3179)
|
2025-01-27 21:39:52 +08:00 |
|
Lianmin Zheng
|
52c03f16b9
|
Add activation parameters to fused_moe (#3170)
|
2025-01-27 00:23:37 -08:00 |
|
Hui Liu
|
8e48ca8cc1
|
enable kv_scale for Gemma2 (#3113)
|
2025-01-25 18:29:14 -08:00 |
|
Ke Wen
|
862bcff833
|
Support loading of larger models with on-the-fly quantization (#3061)
|
2025-01-22 21:33:17 -08:00 |
|
Hui Liu
|
d2571dd5c7
|
Enable Cohere2 Models (#3018)
|
2025-01-20 19:21:41 -08:00 |
|