Shenggui Li
|
fb4c9c3a30
|
[fix] added support for vlm in offline inference (#3548)
|
2025-02-15 05:27:29 +08:00 |
|
Yineng Zhang
|
70f894b810
|
feat: support flashinfer mla attention for deepseek v3 (#3550)
|
2025-02-14 08:50:14 +08:00 |
|
Ata Fatahi
|
b8318aec48
|
Make NCCL NVLS configurable (#3502)
|
2025-02-12 03:25:06 +08:00 |
|
Yineng Zhang
|
d39899e85c
|
upgrade flashinfer v0.2.0.post2 (#3288)
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
|
2025-02-04 21:41:40 +08:00 |
|
YAMY
|
b045841bae
|
Feature/function calling update (#2700)
Co-authored-by: Mingyuan Ma <mamingyuan2001@berkeley.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
|
2025-01-26 09:57:51 -08:00 |
|
Lianmin Zheng
|
1dda8c5e4c
|
Return more infos for computing average acceptance length (#3152)
|
2025-01-26 04:51:54 -08:00 |
|
Lianmin Zheng
|
03464890e0
|
Separate two entry points: Engine and HTTP server (#2996)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
|
2025-01-19 22:09:24 -08:00 |
|