HAI
|
b819381fec
|
AITER backend extension and workload optimizations (#6838)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
|
2025-06-05 23:00:18 -07:00 |
|
Lianmin Zheng
|
20fd53b8f6
|
Correctly abort the failed grammar requests & Improve the handling of abort (#6803)
|
2025-06-01 19:00:07 -07:00 |
|
Sai Enduri
|
f4a8987f69
|
Update amd docker and nightly models. (#6687)
|
2025-05-28 00:08:08 -07:00 |
|
Yineng Zhang
|
f77da69964
|
chore: upgrade mooncake-transfer-engine (#6643)
|
2025-05-26 20:01:30 -07:00 |
|
Sai Enduri
|
eb8f02dd87
|
Update nightly thresholds and dependencies. (#6635)
|
2025-05-26 11:44:13 -07:00 |
|
fzyzcjy
|
25be63d0b2
|
Auto handle PD disaggregation in bench_serving (#6587)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2025-05-25 22:41:27 -07:00 |
|
fzyzcjy
|
d502dae0f0
|
Tiny change killall_sglang.sh (#6596)
|
2025-05-25 22:36:51 -07:00 |
|
kk
|
7a5e6ce1cb
|
Fix GPU OOM (#6564)
Co-authored-by: michael <michael.zhang@amd.com>
|
2025-05-24 16:38:39 -07:00 |
|
Byron Hsu
|
2d831c6ef9
|
[PD] Support structured output (#6560)
|
2025-05-23 21:49:00 -07:00 |
|
Byron Hsu
|
8233cc10fd
|
[PD] Support logprob & Add failure test (#6558)
|
2025-05-23 14:29:20 -07:00 |
|
HAI
|
5c0b38f369
|
aiter attention-backend (default enabled on AMD/ROCm) (#6381)
|
2025-05-20 22:52:41 -07:00 |
|
Yineng Zhang
|
eabcf82acb
|
feat: add long context example (#6391)
|
2025-05-18 01:45:17 -07:00 |
|
Sai Enduri
|
c47a51db7e
|
Clean up AMD CI (#6365)
|
2025-05-18 01:17:28 -07:00 |
|
Lianmin Zheng
|
dcc0a45618
|
Fix amd ci (#6360)
|
2025-05-16 15:33:10 -07:00 |
|
Lianmin Zheng
|
e07a6977e7
|
Minor improvements of TokenizerManager / health check (#6327)
|
2025-05-15 15:29:25 -07:00 |
|
Stefan He
|
1ab14c4c5c
|
[VERL Use Case] Add torch_memory_saver into deps (#6247)
|
2025-05-12 19:09:03 -07:00 |
|
Yineng Zhang
|
f94543d22b
|
chore: add hf_xet dep (#6243)
|
2025-05-12 13:08:40 -07:00 |
|
shangmingc
|
0f334945c6
|
[CI] Fix PD mooncake dependency error (#6212)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-12 10:08:49 -07:00 |
|
Lianmin Zheng
|
03227c5fa6
|
[CI] Reorganize the 8 gpu tests (#6192)
|
2025-05-11 10:55:06 -07:00 |
|
Yineng Zhang
|
230106304d
|
chore: upgrade sgl-kernel v0.1.2.post1 (#6196)
Co-authored-by: alcanderian <alcanderian@gmail.com>
|
2025-05-11 22:41:37 +08:00 |
|
shangmingc
|
31d1f6e7f4
|
[PD] Add simple unit test for disaggregation feature (#5654)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-11 13:35:27 +08:00 |
|
applesaucethebun
|
2ce8793519
|
Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 12:55:00 +08:00 |
|
Jinyan Chen
|
8a828666a3
|
Add DeepEP to CI PR Test (#5655)
Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>
|
2025-05-06 17:36:03 -07:00 |
|
Huapeng Zhou
|
b8559764f6
|
[Test] Add flashmla attention backend test (#5587)
|
2025-05-05 10:32:02 -07:00 |
|
Yineng Zhang
|
9a6ad8916d
|
chore: upgrade sgl-kernel 0.1.1 (#5933)
|
2025-04-30 16:13:30 -07:00 |
|
Yineng Zhang
|
41ac0c6d48
|
chore: upgrade sgl-kernel 0.1.0 (#5690)
|
2025-04-27 21:00:50 -07:00 |
|
Lianmin Zheng
|
3dd3538c18
|
Pin torch audio to 2.6.0 (#5750)
|
2025-04-25 15:06:28 -07:00 |
|
Ravi Theja
|
7d9679b74d
|
Add MMMU benchmark results (#4491)
Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>
|
2025-04-25 15:23:53 +08:00 |
|
Yineng Zhang
|
7282ab741a
|
fix: update bench_speculative (#5649)
|
2025-04-22 16:08:15 -07:00 |
|
Byron Hsu
|
bf98d2e377
|
[PD] Support prefill overlap + Ensure no race condition (#5609)
|
2025-04-21 12:12:56 -07:00 |
|
Byron Hsu
|
deded17f38
|
[PD] Fix edge case and simplify large page size + chunked prefill (#5589)
|
2025-04-21 10:27:02 -07:00 |
|
Byron Hsu
|
c951d312ed
|
[PD] Fix large page size + chunk prefill (#5588)
|
2025-04-20 17:21:54 -07:00 |
|
lukec
|
417b44eba8
|
[Feat] upgrade pytorch2.6 (#5417)
|
2025-04-20 16:06:34 -07:00 |
|
Yineng Zhang
|
0961feefca
|
feat: use flashinfer jit package (#5547)
|
2025-04-19 00:28:39 -07:00 |
|
Yineng Zhang
|
2c11f9c2eb
|
chore: upgrade sgl-kernel 0.0.9.post2 (#5540)
|
2025-04-18 21:17:23 -07:00 |
|
Baizhou Zhang
|
6fb29ffd9e
|
Deprecate enable-flashinfer-mla and enable-flashmla (#5480)
|
2025-04-17 01:43:33 -07:00 |
|
Yineng Zhang
|
8ec0bb7d55
|
chore: upgrade sgl-kernel 0.0.9.post1 (#5436)
|
2025-04-15 15:45:51 -07:00 |
|
Yineng Zhang
|
8aab7fdb21
|
chore: upgrade sgl-kernel 0.0.9 (#5401)
|
2025-04-14 22:37:59 -07:00 |
|
Yineng Zhang
|
f58b929a51
|
chore: upgrade sgl-kernel 0.0.8.post3 (#5342)
|
2025-04-13 00:45:59 -07:00 |
|
Adarsh Shirawalmath
|
a0a9f6d64f
|
[Docs] Remove the older supported docs section (#5301)
|
2025-04-11 11:30:18 -07:00 |
|
Yineng Zhang
|
80aa8ca84e
|
fix: update update_wheel_index for cu128 (#5300)
|
2025-04-11 09:31:03 -07:00 |
|
Yi Zhang
|
aba5ca154d
|
python transfer custom allreduce from trt kernel to vllm kernel (#5080)
|
2025-04-05 15:35:55 -07:00 |
|
Yineng Zhang
|
0d99adb715
|
upgrade transformers 4.51.0 (#5088)
|
2025-04-05 14:20:23 -07:00 |
|
Yineng Zhang
|
e53bf190bc
|
upgrade sgl-kernel v0.0.7 (#5049)
|
2025-04-03 17:07:54 -07:00 |
|
Xiaoyu Zhang
|
772d2a191d
|
try to fix ci oserror (#5024)
|
2025-04-03 02:45:05 -07:00 |
|
Yineng Zhang
|
1c63e79756
|
use fa3 in sgl-kernel (#4954)
|
2025-03-31 16:14:49 -07:00 |
|
Lianmin Zheng
|
b26bc86b36
|
Support page size > 1 + eagle (#4908)
|
2025-03-30 00:46:23 -07:00 |
|
Yineng Zhang
|
d8a136a113
|
upgrade sgl-kernel 0.0.5.post4 (#4873)
|
2025-03-28 19:48:56 -07:00 |
|
Lianmin Zheng
|
74e0ac1dbd
|
Clean up import vllm in quantization/__init__.py (#4834)
|
2025-03-28 10:34:10 -07:00 |
|
Xiaoyu Zhang
|
04e3ff6975
|
Support compressed tensors fp8w8a8 (#4743)
|
2025-03-26 13:21:25 -07:00 |
|