Commit Graph

35 Commits

Author SHA1 Message Date
eraser00
0ac6114694 Replace the Kimi-K2 generated tool call idx with history tool call count (#10612)
Co-authored-by: eraser00 <eraser00@github.com>
2025-09-25 18:47:40 -07:00
Lianmin Zheng
f68dd998b9 Rename customer label -> custom label (#10899)
Co-authored-by: Yingchun Lai <laiyingchun@apache.org>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-25 16:19:53 -07:00
Xinyuan Tong
71f24ef8f6 feat: add cache_salt support to request (#10718)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-09-23 23:30:25 -07:00
harrisonlimh
14fdd52740 feat: add priority based scheduling with priority based request acceptance and preemption (#8746) 2025-09-16 17:10:10 -07:00
Yingchun Lai
fc2c3a3d8e metrics: support customer labels specified in request header (#10143) 2025-09-14 20:00:08 -07:00
Lianmin Zheng
033b75f559 [Auto Sync] Update serving_base.py, serving_chat.py, servin... (20250910) (#10282)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: cctry <shiyang@x.ai>
2025-09-10 16:58:59 -07:00
Xinyuan Tong
f3b5db6ee8 Feat: support disable tool parser (#10184) 2025-09-10 14:03:55 -07:00
Lianmin Zheng
60e37f8028 Move parsers under a single folder (#9912) 2025-09-02 18:25:04 -07:00
Xiaotong Jiang
0936c766ed Fix kimi k2 function calling format (#9606) 2025-08-26 00:50:59 -07:00
Jonas
a0a77d937b Fix Harmony reasoning parser for and auto-separation for gpt-oss models (#9190)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: minleminzui <2969413251@qq.com>
Co-authored-by: maocheng23 <maocheng@berkeley.edu>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-25 15:26:26 -07:00
Binyao Jiang
3affa9dcc3 Fix GLM45 tool call multi-turn bug (#9500) 2025-08-25 13:46:13 -07:00
Xinyuan Tong
e8449ab515 Add deepseek v3.1 thinking parser support and update docs (#9464)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-21 15:09:40 -07:00
gongwei-130
10d34f74e2 fix: should return a invalid request response when schema missing (#9461) 2025-08-21 14:06:50 -07:00
gongwei-130
9ba7253094 accomendate reasoning_effort set in chat_template_kwargs (#9458) 2025-08-21 13:22:03 -07:00
Chengxing Xie
c1c7dc4534 feat: Add model version tracking with API endpoints and response metadata (#8795) 2025-08-14 12:13:46 -07:00
Chang Su
f2a5de284b [Bugfix] Fix accuracy-test-1-gpu failure caused by builtin_tools (#9114) 2025-08-12 09:56:13 -07:00
Chang Su
a218490136 (gpt-oss, oai, chat): Remove Harmony Integration and Implement Native GPT-OSS Tool Call Support (#9043) 2025-08-11 18:59:18 -07:00
Xinyuan Tong
3e7ff1ab1f fix: reasoning parser when request have enable_thinking flag (#8933)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-07 15:52:06 -07:00
Xinyuan Tong
3fa3c6cd6a Enables force reasoning based on chat template for Qwen3-Thinking (#8369)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Chang Su <csu272@usc.edu>
2025-08-06 20:02:47 -07:00
Chang Su
92cc32d9fc Support v1/responses and use harmony in serving_chat (#8837)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-06 16:20:34 -07:00
Chang Su
a79a5d7012 Revert "Fix the input tools format and history tool_calls in OpenAI API (#6556)" (#8584) 2025-07-30 13:12:05 -07:00
Chang Su
b47eda3316 bugfix: Fix multiple finish_reason chunks and tool_calls finish reason check (#8417) 2025-07-27 13:31:06 -07:00
Binyao Jiang
e983d66680 Fix: Improve test_openai_function_calling unit test and fix reasoning_parser.py think_start_token logic (#8316)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-07-27 13:12:59 -07:00
Ying Wang
7ad6b766c5 fix: Fix failed functional tests https://github.com/meta-llama/llama-stack-evals (#8266) 2025-07-24 23:11:32 -07:00
xianzhiT
c87d4fec99 Fix the issue of incorrect finish reason in final stream response chunk returned during tool call (#7708)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
2025-07-23 13:28:53 -07:00
jiawei
f1f1d1d40d Fix the input tools format and history tool_calls in OpenAI API (#6556) 2025-07-15 00:58:55 -07:00
Mick
b5e3d6031c vlm: support video as an input modality (#5888) 2025-07-09 23:48:35 -07:00
ybyang
03c039c48e [OAI] patch origin request_id logic (#7508) 2025-06-24 20:09:38 -07:00
Chang Su
112b496a6c misc: Improvement to serving_chat.py and add more ut (#7489) 2025-06-24 17:19:51 -07:00
huangtingwei
7732bbe458 bugfix: Prevent global mutation of conv.stop_str across requests (#7347)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-06-23 19:36:23 -07:00
Chang Su
72676cd6c0 feat(oai refactor): Replace openai_api with entrypoints/openai (#7351)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
2025-06-21 13:21:06 -07:00
Keyang Ru
5e7fdc79fa [OAI Server Refactor] [ChatCompletions & Completions] Support Return Hidden State (#7329)
Signed-off-by: keru <rukeyang@gmail.com>
2025-06-20 19:18:53 -07:00
yhyang201
dea2b84bc3 [OAI Server Refactor] [ChatCompletions & Completions] Implement UsageInfo Processor (#7360)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-06-20 14:51:21 -07:00
Xinyuan Tong
0998808009 Refine OpenAI serving entrypoint to remove batch requests (#7372)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Chang Su <csu272@usc.edu>
2025-06-20 14:33:43 -07:00
Xinyuan Tong
70c471a868 [Refactor] OAI Server components (#7167)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-06-16 20:45:20 -07:00