Yineng Zhang
|
2937387a50
|
fix accuracy issue (#4376)
|
2025-03-13 02:06:22 -07:00 |
|
yuhui
|
cf721fdece
|
Update grafana.json (#4374)
|
2025-03-13 01:31:33 -07:00 |
|
Lianmin Zheng
|
45de89719c
|
Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367)
|
2025-03-12 23:45:52 -07:00 |
|
Meng, Hengyu
|
71046fcd71
|
[XPU][CPU] Enable the native path of DeepSeek (#4086)
Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>
|
2025-03-12 22:26:29 -07:00 |
|
Lianmin Zheng
|
c76040e31b
|
Support page size > 1 (#4356)
|
2025-03-12 22:22:39 -07:00 |
|
Cheng Wan
|
2f6bacee03
|
[moe] fix: correct the cache size in the last chunk (#3679)
Co-authored-by: Abatom <abzhonghua@gmail.com>
|
2025-03-12 22:22:13 -07:00 |
|
Wen Sun
|
4014804157
|
Ensure Usage Data in Streaming Responses Aligns with vLLM’s Implementation (#3814)
|
2025-03-12 22:12:55 -07:00 |
|
yang_zcybb
|
ad46550d25
|
[Doc] Fix typo in backend/sampling_params (#3835)
Co-authored-by: yangzhice.124 <yangzhice.124@bytedance.com>
|
2025-03-12 22:12:14 -07:00 |
|
Jun Liu
|
14344caa38
|
[docs] Update outdated description about torch.compile (#3844)
|
2025-03-12 22:09:38 -07:00 |
|
David Carreto Fidalgo
|
f7f88b706c
|
HotFix: json serialization error when using OAI v1/batches endpoint with logprobs (#3896)
|
2025-03-12 22:04:29 -07:00 |
|
yiakwy-xpu-ml-framework-team
|
18c27131f5
|
[tools] add fp8 max/min constant in utils (#3959)
|
2025-03-12 21:44:55 -07:00 |
|
YR Chen
|
ccdd10c84b
|
Move aiohttp into public dependencies (#3980)
|
2025-03-12 21:42:57 -07:00 |
|
vikram singh shekhawat
|
76f6c0ebf9
|
Add device detection and count functions to utils. (#3962)
|
2025-03-12 21:41:50 -07:00 |
|
Chitsing KUI
|
959a3143fc
|
example: add async offline inference demo (#3961)
Signed-off-by: joeshikui <joeshikui@tencent.com>
Co-authored-by: joeshikui <joeshikui@tencent.com>
|
2025-03-12 21:41:21 -07:00 |
|
Conghui Tan
|
6412c5e493
|
Avoid duplicated request ids in batch APIs (#4026)
Co-authored-by: conghuitan <conghuitan@tencent.com>
|
2025-03-12 21:38:17 -07:00 |
|
laixin
|
0c02086015
|
add INT8 example into dsv3 README (#4079)
|
2025-03-12 21:37:30 -07:00 |
|
AniZpZ
|
85ef7f64e4
|
[FIX] fix incorrect output when enable both deepgemm and torch compile (#4359)
Co-authored-by: xuyongfei.xyf <xuyongfei.xyf@antgroup.com>
|
2025-03-12 21:34:09 -07:00 |
|
Chen Shengzhi
|
f1cf6eefbe
|
[Fix] Check the device backend before calling empty_cache function (#4212)
|
2025-03-12 21:28:48 -07:00 |
|
William
|
0a59a4657a
|
Fix the doc of FR-Spec (#4295)
|
2025-03-12 21:22:50 -07:00 |
|
Wang Ran (汪然)
|
aff79f101f
|
simple bugfix (#4342)
|
2025-03-12 21:20:18 -07:00 |
|
Peter Pan
|
016033188c
|
docs: add parameter --log-requests-level (#4335)
|
2025-03-12 21:19:37 -07:00 |
|
William
|
56c39a05a2
|
Remove the choices in --speculative-eagle-topk argument (#4329)
|
2025-03-12 21:19:16 -07:00 |
|
Qingquan Song
|
4068e01292
|
Fix per token fp8 quant precision (#4362)
|
2025-03-12 21:19:05 -07:00 |
|
Shi Shuai
|
817d43705c
|
feat: support ep size < 32 for sgl kernel (#4348)
|
2025-03-12 20:50:46 -07:00 |
|
文峰
|
c550e52f8b
|
Fix scheduler proctitle suffix is None (#4326)
Co-authored-by: wenfeng.wf <wenfeng.wf@alibaba-inc.com>
|
2025-03-12 19:29:35 -07:00 |
|
Lianmin Zheng
|
e35a93fa8a
|
Move output processing logic from scheduler.py into a separate file (#4354)
|
2025-03-12 16:21:49 -07:00 |
|
shizhediao
|
2c3656f276
|
[Fix Doc.] Enable internal forwarding when starting the router (#4355)
|
2025-03-12 15:53:26 -07:00 |
|
Lianmin Zheng
|
d40ee62b5d
|
Update nightly tests (#4352)
|
2025-03-12 15:36:13 -07:00 |
|
Wang Ran (汪然)
|
91b19949d7
|
typo: Update http_server.py (#4350)
|
2025-03-12 15:05:30 -07:00 |
|
Elfie Guo
|
7c86671131
|
Support Blackwell Block Scale FP8 Gemm (#4278)
|
2025-03-12 14:17:11 -07:00 |
|
Zhiqiang Xie
|
10b544ae9b
|
Hierarchical Caching Refactoring and Fixing TP issue (#4082)
|
2025-03-12 11:22:35 -07:00 |
|
Mick
|
01090e8ac3
|
model: Support Janus-pro (#3203)
|
2025-03-12 11:02:11 -07:00 |
|
yych0745
|
6f43a9b9f4
|
remove the unused readline dependency from the Qwen2 model implementa… (#4340)
|
2025-03-12 02:47:27 -07:00 |
|
JieXin Liang
|
0540fef7a1
|
[Fix] fix _yarn_linear_ramp_mask with device parameter (#4337)
|
2025-03-12 02:28:19 -07:00 |
|
lambert0312
|
481f608b8e
|
Add INT8 support MTP NextN function (#3911)
|
2025-03-12 01:37:16 -07:00 |
|
Yineng Zhang
|
ed91561f79
|
upgrade sgl-kernel 0.0.4.post3 (#4334)
|
2025-03-12 01:36:41 -07:00 |
|
Yineng Zhang
|
6e7239f912
|
release 0.0.4.post3 sgl-kernel (#4331)
|
2025-03-12 01:05:16 -07:00 |
|
Yineng Zhang
|
0a3960f21f
|
fix awq_dequantize (#4333)
|
2025-03-12 01:04:38 -07:00 |
|
Rex
|
07f944631e
|
Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104)
|
2025-03-12 00:10:02 -07:00 |
|
Stefan He
|
e0917e6bd0
|
Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215)
Co-authored-by: Stefan He <bhe@linkedin.com>
|
2025-03-12 00:08:03 -07:00 |
|
Xiaoyu Zhang
|
7130a7cea9
|
refine sgl_moe_align_block_size_benchmark (#4327)
|
2025-03-11 22:48:38 -07:00 |
|
Michael Yao
|
8f1f614ee2
|
[Docs] Clean up benchmark_and_profiling.md (#4297)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-03-11 21:48:21 -07:00 |
|
lambert0312
|
7140ba3573
|
Add A800 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4323)
|
2025-03-11 18:25:56 -07:00 |
|
Yineng Zhang
|
d1da58e275
|
unify is_cuda and is_hip (#4321)
|
2025-03-11 18:12:56 -07:00 |
|
Yineng Zhang
|
1cf63485c1
|
upgrade flashinfer 0.2.3 (#4317)
Co-authored-by: qingquansong <qsong@linkedin.com>
|
2025-03-11 15:37:17 -07:00 |
|
Mick
|
ff2ce0b86f
|
refactor: move image processors to separate files (#4229)
|
2025-03-11 12:35:35 -07:00 |
|
Ximingwang-09
|
0f2a2e3c19
|
Add H20 tuning configs support DeepSeek V3/R1 INT8(block-wise) (#4220)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
|
2025-03-11 12:32:33 -07:00 |
|
yigex
|
690e1f2371
|
[AMD] Fix rocm sgl-kernel missing modules error (#4311)
Co-authored-by: yiakwy-xpu-ml-framework-team <leiwang2@amd.com>
|
2025-03-11 10:35:28 -07:00 |
|
Yineng Zhang
|
00f42707ea
|
update doc (#4299)
|
2025-03-11 01:14:16 -07:00 |
|
yych0745
|
6a02b32d07
|
Add A100 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4287)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2025-03-11 00:49:06 -07:00 |
|