Tony Lu
|
1e18a341e9
|
[Bugfix] fix pd chat completion protocol for batching support (#10016)
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
|
2025-09-04 01:43:16 -07:00 |
|
Liangsheng Yin
|
5dfcd6c207
|
add proctitle for tokenizers (#9952)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-03 13:31:38 +08:00 |
|
Lianmin Zheng
|
60e37f8028
|
Move parsers under a single folder (#9912)
|
2025-09-02 18:25:04 -07:00 |
|
JieXin Liang
|
1db649ac02
|
[feat] apply deep_gemm compile_mode to skip launch (#9879)
|
2025-09-02 03:20:30 -07:00 |
|
Yineng Zhang
|
349b491c63
|
chore: upgrade flashinfer 0.3.0 (#9864)
|
2025-09-01 03:07:19 -07:00 |
|
ybyang
|
5f77e1292d
|
Support Multi Process Tokenizer Manager(#6555) (#8964)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com>
Co-authored-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-09-01 01:00:13 -07:00 |
|
Teng Ma
|
f05c68733e
|
[HiCache] Clear kvcache in storage backend with fastAPI (#9750)
Co-authored-by: hzh0425 <hzh0425@apache.org>
|
2025-08-31 17:41:44 +08:00 |
|
Yineng Zhang
|
9970e3bf32
|
chore: upgrade sgl-kernel 0.3.7.post1 with deepgemm fix (#9822)
|
2025-08-30 04:02:25 -07:00 |
|
Yineng Zhang
|
3d8fc43400
|
chore: upgrade flashinfer 0.3.0rc1 (#9793)
|
2025-08-29 16:24:17 -07:00 |
|
gongwei-130
|
3fd1431df2
|
support enable in the reasoning field to enable thingking for thinkin… (#9715)
|
2025-08-29 10:57:32 -07:00 |
|
gongwei-130
|
9a7c8842ba
|
accomendate json schema in the "schema" field, not in "json_schema" field of response_format (#9786)
|
2025-08-28 23:51:50 -07:00 |
|
Yineng Zhang
|
b962a296ed
|
chore: upgrade sgl-kernel 0.3.7 (#9708)
|
2025-08-27 14:00:31 -07:00 |
|
Xinyuan Tong
|
68a54e063e
|
Sets default model name in request classes (#9683)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-08-27 10:43:03 -07:00 |
|
cicirori
|
b6c14ec0b4
|
add response_format support for completion API (#9665)
|
2025-08-26 15:01:29 -07:00 |
|
Xiaotong Jiang
|
0936c766ed
|
Fix kimi k2 function calling format (#9606)
|
2025-08-26 00:50:59 -07:00 |
|
GavinZhu-GMI
|
0ef583b7de
|
fix: allow user to specify function as role (#9635)
|
2025-08-26 00:47:20 -07:00 |
|
Jonas
|
a0a77d937b
|
Fix Harmony reasoning parser for and auto-separation for gpt-oss models (#9190)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: minleminzui <2969413251@qq.com>
Co-authored-by: maocheng23 <maocheng@berkeley.edu>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-08-25 15:26:26 -07:00 |
|
Binyao Jiang
|
3affa9dcc3
|
Fix GLM45 tool call multi-turn bug (#9500)
|
2025-08-25 13:46:13 -07:00 |
|
Yineng Zhang
|
938e986e15
|
chore: upgrade flashinfer 0.2.14.post1 (#9578)
|
2025-08-25 00:12:17 -07:00 |
|
Yuhao Zhou
|
17d5eda887
|
bugfix for undefined logging functions in HarmonyBrowserTool & HarmonyPythonTool (#9229)
|
2025-08-25 00:10:35 -07:00 |
|
fzyzcjy
|
2600fc0d47
|
Overlapped weight offload (#8034)
|
2025-08-23 02:06:46 -07:00 |
|
Chanh Nguyen
|
127d4b0d5e
|
Support GC Freezing to improve latency & throughput (#9241)
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2025-08-23 13:43:09 +08:00 |
|
Xinyuan Tong
|
6c855db82c
|
Revert "bugfix: Fix output_ids extraction in detokenizer_manager" (#9467)
|
2025-08-21 17:24:25 -07:00 |
|
Xinyuan Tong
|
e8449ab515
|
Add deepseek v3.1 thinking parser support and update docs (#9464)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-08-21 15:09:40 -07:00 |
|
gongwei-130
|
10d34f74e2
|
fix: should return a invalid request response when schema missing (#9461)
|
2025-08-21 14:06:50 -07:00 |
|
gongwei-130
|
9ba7253094
|
accomendate reasoning_effort set in chat_template_kwargs (#9458)
|
2025-08-21 13:22:03 -07:00 |
|
hlu1
|
dae9a80f43
|
[fix] Fix mxfp4 weight loading bug with TP sharding in GPT-OSS (#9433)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-08-21 03:50:51 -07:00 |
|
fzyzcjy
|
42c8704560
|
Add PDL support for quant kernel and rope kernel (#9106)
|
2025-08-20 01:56:29 -07:00 |
|
Keyang Ru
|
f515449582
|
Fix gpt-oss response api streaming issue (#9368)
|
2025-08-19 20:19:42 -07:00 |
|
江家瑋
|
ca533580f2
|
[Docs] Correct and clarify notes in Engine docstring (#9313)
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
|
2025-08-18 13:24:19 -07:00 |
|
gongwei-130
|
0cf3fbeb18
|
should return invalide request for empty prompt (#9315)
|
2025-08-18 11:44:11 -07:00 |
|
Chengxing Xie
|
c1c7dc4534
|
feat: Add model version tracking with API endpoints and response metadata (#8795)
|
2025-08-14 12:13:46 -07:00 |
|
Hongbo Xu
|
2cc9eeab01
|
[4/n]decouple quantization implementation from vLLM dependency (#9191)
Co-authored-by: AniZpZ <aniz1905@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-08-14 12:05:46 -07:00 |
|
eigen
|
4dbf43601d
|
fix: zero_init buffer (#9065)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-08-14 02:39:09 -07:00 |
|
Jiaqi Gu
|
c9ee738515
|
Fuse writing KV buffer into rope kernel (part 2: srt) (#9014)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
|
2025-08-12 13:15:30 -07:00 |
|
Chang Su
|
f2a5de284b
|
[Bugfix] Fix accuracy-test-1-gpu failure caused by builtin_tools (#9114)
|
2025-08-12 09:56:13 -07:00 |
|
Chang Su
|
a218490136
|
(gpt-oss, oai, chat): Remove Harmony Integration and Implement Native GPT-OSS Tool Call Support (#9043)
|
2025-08-11 18:59:18 -07:00 |
|
Chang Su
|
a6452b7188
|
bugfix: Fix output_ids extraction in detokenizer_manager (#9047)
|
2025-08-11 03:17:32 -07:00 |
|
zhyncs
|
f4ae50e97c
|
fix: use flashinfer v0.2.11.post1
|
2025-08-11 02:49:25 -07:00 |
|
Yineng Zhang
|
84cb449eec
|
Revert "chore: upgrade flashinfer 0.2.11 (#9036)" (#9057)
|
2025-08-11 00:16:39 -07:00 |
|
Yineng Zhang
|
dd001a5477
|
chore: upgrade flashinfer 0.2.11 (#9036)
|
2025-08-10 17:35:37 -07:00 |
|
Lianmin Zheng
|
4ea9d74a3e
|
Simplify health check (#9034)
|
2025-08-10 17:35:05 -07:00 |
|
Stefan He
|
8ecf6b9d24
|
Support Flatten Tensor Update Weights to speed up MOE Update Weights by 20% (#8079)
|
2025-08-10 16:08:59 -07:00 |
|
Lianmin Zheng
|
9a44b643c6
|
Fix CI (#9012)
|
2025-08-09 13:33:42 -07:00 |
|
Yineng Zhang
|
326a901df4
|
chore: upgrade sgl-kernel 0.3.3 (#8998)
|
2025-08-09 01:22:01 -07:00 |
|
Lianmin Zheng
|
706bd69cc5
|
Clean up server_args.py to have a dedicated function for model specific adjustments (#8983)
|
2025-08-08 19:56:50 -07:00 |
|
ishandhanani
|
4e7f025219
|
chore(gb200): update to CUDA 12.9 and improve build process (#8772)
|
2025-08-08 13:42:47 -07:00 |
|
Zilin Zhu
|
dd650e0e21
|
[RL] fix skip_server_warmup and rl health_generate logic (#8757)
|
2025-08-08 04:34:38 -07:00 |
|
Lianmin Zheng
|
a947154286
|
Revert "Support Multi Process Tokenizer Manager" (#8960)
|
2025-08-08 02:28:27 -07:00 |
|
ybyang
|
7490e3f67d
|
Support Multi Process Tokenizer Manager (#6555)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: lw9527 <952799980@qq.com>
Co-authored-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com>
|
2025-08-08 01:45:50 -07:00 |
|