Yineng Zhang
|
cf0f7eafe6
|
chore: bump v0.4.2.post1 (#3233)
|
2025-01-31 20:35:55 +08:00 |
|
Yineng Zhang
|
4ab43cfb3e
|
chore: bump v0.4.2 (#3180)
|
2025-01-27 21:42:05 +08:00 |
|
Yineng Zhang
|
2f79f58873
|
feat: use sgl-kernel 0.0.3 in sglang (#3179)
|
2025-01-27 21:39:52 +08:00 |
|
Yineng Zhang
|
7e0976133c
|
udpate sgl-kernel version for srt (#3150)
|
2025-01-26 20:22:34 +08:00 |
|
Yineng Zhang
|
e94fb7cb10
|
chore: bump v0.4.1.post7 (#3009)
|
2025-01-20 21:50:55 +08:00 |
|
Enrique Shockwave
|
3bcf5ecea7
|
support regex in xgrammar backend (#2983)
|
2025-01-20 04:34:41 +08:00 |
|
Yineng Zhang
|
2c05f81f15
|
fix custom op version compatibility (#2988)
|
2025-01-20 04:21:29 +08:00 |
|
Chunyuan WU
|
63051738a9
|
Enable CPU device on SGLang (#2806)
|
2025-01-16 21:22:53 -08:00 |
|
yizhang2077
|
767c9dec03
|
adapt custom allreduce for tensorrt llm (#2511)
|
2025-01-16 04:57:35 +08:00 |
|
Yineng Zhang
|
b3e99dfb22
|
chore: bump v0.4.1.post6 (#2899)
|
2025-01-15 16:23:42 +08:00 |
|
fzyzcjy
|
923f518337
|
CUDA-graph-compatible releasing and resuming KV cache and model weight memory (#2630)
|
2025-01-13 11:38:51 -08:00 |
|
Xiaoyu Zhang
|
d08c77c434
|
Sampling penalties memory interface (#2870)
|
2025-01-13 23:09:00 +08:00 |
|
Lianmin Zheng
|
6249e4a19e
|
Revert "Integration of TurboMind AWQ" (#2866)
|
2025-01-13 04:44:39 -08:00 |
|
bjmsong
|
17de02f98d
|
Integration of TurboMind AWQ (#2828)
Co-authored-by: root <bjmsong@126.com>
|
2025-01-13 20:14:16 +08:00 |
|
Yineng Zhang
|
f624901cdd
|
chore: bump v0.4.1.post5 (#2840)
|
2025-01-11 23:10:02 +08:00 |
|
Lianmin Zheng
|
bdc1acf6cd
|
Misc fix for min_p_sampling, --cuda-graph-bs (#2761)
|
2025-01-07 02:52:53 -08:00 |
|
Yineng Zhang
|
2f0d386496
|
chore: bump v0.4.1.post4 (#2713)
|
2025-01-06 01:29:54 +08:00 |
|
kk
|
b6e0cfb5e1
|
ROCm base image update (#2692)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2025-01-01 12:12:19 +08:00 |
|
Lianmin Zheng
|
03d5fbfd44
|
Release 0.4.1.post3 - upload the config.json to PyPI (#2647)
|
2024-12-29 14:25:53 -08:00 |
|
Yineng Zhang
|
3ccf566b0d
|
chore: bump v0.4.1.post2 (#2643)
|
2024-12-30 00:11:46 +08:00 |
|
Yineng Zhang
|
ef5b0ff90b
|
chore: bump v0.4.1.post1 (#2616)
|
2024-12-28 00:11:06 +08:00 |
|
HandH1998
|
6e5305158c
|
update sgl_moe_align_block_size usage (#2617)
|
2024-12-28 00:01:13 +08:00 |
|
yudian0504
|
531d6ea968
|
fix: package data missing (#2521)
|
2024-12-26 08:16:48 -08:00 |
|
Yineng Zhang
|
635a042623
|
docs: update deepseek v3 example (#2592)
|
2024-12-26 17:43:37 +08:00 |
|
Yineng Zhang
|
efc52f85e2
|
chore: bump v0.4.1 (#2582)
|
2024-12-26 07:14:51 +08:00 |
|
Yineng Zhang
|
60e2fdcf4f
|
use sgl-kernel moe_align_block_size (#2581)
Co-authored-by: ispobock <ispobaoke@163.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2024-12-26 06:29:08 +08:00 |
|
Yineng Zhang
|
8f4d04e540
|
chore: bump v0.4.0.post2 (#2525)
|
2024-12-21 21:16:34 +08:00 |
|
Jerry Zhang
|
feb2b768ba
|
Add integration with gemlite weight only quant (#2528)
|
2024-12-21 00:25:25 +08:00 |
|
Yineng Zhang
|
4b83db24f1
|
fix: continue to use flashinfer 0.1.6 temporarily (#2517)
|
2024-12-19 14:03:24 +08:00 |
|
Yineng Zhang
|
626a99ac13
|
chore: update ao v0.7.0 (#2447)
|
2024-12-11 04:44:28 -08:00 |
|
Lianmin Zheng
|
641b7d0ae0
|
[Minor] Improve code style (#2422)
|
2024-12-09 06:30:35 -08:00 |
|
SangBin Cho
|
1f09e84b9a
|
nit: Remove busy waiting on scheduler (#2382)
|
2024-12-08 01:06:15 -08:00 |
|
Yineng Zhang
|
aaac33fd8d
|
fix: update xgrammar v0.1.6 (#2390)
|
2024-12-07 21:09:16 +08:00 |
|
Lianmin Zheng
|
e5f227c0ee
|
Release v0.4.0.post1 (#2375)
|
2024-12-06 06:08:19 -08:00 |
|
Yineng Zhang
|
2db4469808
|
minor: limit the range of vllm versions (#2350)
|
2024-12-05 02:00:34 +08:00 |
|
Yineng Zhang
|
f8b0326934
|
chore: bump v0.4.0 (#2338)
|
2024-12-03 11:55:41 -08:00 |
|
Yineng Zhang
|
fae4e5e99a
|
chore: bump v0.3.6.post3 (#2259)
|
2024-11-30 01:41:16 +08:00 |
|
Lianmin Zheng
|
fed4c6946a
|
Release v0.3.6.post2 (#2214)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-11-27 03:35:30 -08:00 |
|
Yineng Zhang
|
bc1f6fda0d
|
fix: add cuda-python for xgrammar (#2199)
|
2024-11-26 17:24:18 +08:00 |
|
Lianmin Zheng
|
ac5a0f0488
|
Release v0.3.6.post1 (#2189)
|
2024-11-25 17:31:37 -08:00 |
|
Lianmin Zheng
|
1605ae121e
|
[CI] Minor fix for CI (#2187)
|
2024-11-25 16:38:43 -08:00 |
|
Yixin Dong
|
7f076c2ce6
|
Update XGrammar to the latest API (#2176)
Co-authored-by: Ben Gitter <gitterbd@gmail.com>
|
2024-11-25 15:58:30 -08:00 |
|
Ankur Neog
|
865233e256
|
Add initial support for intel Gaudi accelerators (#2121)
|
2024-11-22 20:22:23 -08:00 |
|
Yineng Zhang
|
2797bc3422
|
fix: add xgrammar dependency (#2126)
|
2024-11-22 20:53:11 +08:00 |
|
Yineng Zhang
|
9a00e6f453
|
chore: bump v0.3.6 (#2120)
|
2024-11-22 19:27:30 +08:00 |
|
Lianmin Zheng
|
dfec7fca06
|
Rename sglang.bench_latency to sglang.bench_one_batch (#2118)
|
2024-11-21 20:07:48 -08:00 |
|
Yineng Zhang
|
766192610e
|
feat: update torch 2.5.1 (#2069)
|
2024-11-18 21:29:13 +08:00 |
|
Lianmin Zheng
|
c1f401fc58
|
Revert "chore: update torch v2.5.1" (#2063)
|
2024-11-17 15:29:38 -08:00 |
|
Yineng Zhang
|
3b878863f7
|
chore: update torch v2.5.1 (#1849)
|
2024-11-18 00:06:00 +08:00 |
|
Lianmin Zheng
|
32c9a7ec11
|
Release v0.3.5.post2 (#2046)
|
2024-11-15 06:54:00 -08:00 |
|