Commit Graph

83 Commits

Author SHA1 Message Date
Lianmin Zheng
71133a0426 [Auto Sync] Update sampling_batch_info.py (20250909) (#10212)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: cctry <shiyang@x.ai>
2025-09-09 01:29:52 -07:00
yhyang201
c377923304 [feat] Reduce GPU memory overhead by using weakref (#9673) 2025-08-28 01:09:06 -07:00
Lianmin Zheng
9e426466af Clean up allocators (#9134)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-13 13:56:04 -07:00
J
0e7a5b2694 fix: prevent crashes due to logit bias dimension mismatch (#7685) 2025-07-23 15:30:55 -07:00
ehuaa
0c55cbcfc5 [BugFix] add verify logit_bias to avoid crash because of IndexError (#7749) 2025-07-14 02:44:12 +08:00
Lianmin Zheng
071a1f51ae [Minor] clean up multimodal processor and tokenizer manager (#7624) 2025-06-29 02:50:14 -07:00
Brayden Zhong
ca9291181d [Feature] Add Logit Bias (#6579)
Co-authored-by: Cinjon Resnick <cinjon.resnick@gmail.com>
2025-06-10 15:39:25 -07:00
Lianmin Zheng
608668e143 Slightly improve the sampler to skip unnecessary steps (#6956) 2025-06-08 03:18:54 -07:00
Lianmin Zheng
ac2324c177 Skip the flaky test_stateful_custom_logit_processor (#6251) 2025-05-12 18:29:41 -07:00
Lianmin Zheng
d18c6b3358 Support incremental streaming of logprob/token_ids between scheduler and detokenizer (#6225)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-05-12 14:33:38 -07:00
applesaucethebun
2ce8793519 Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 12:55:00 +08:00
Lianmin Zheng
de167cf5fa Fix request abortion (#6184) 2025-05-10 21:54:46 -07:00
Yineng Zhang
66fc63d6b1 Revert "feat: add thinking_budget (#6089)" (#6181) 2025-05-10 16:07:45 -07:00
thyecust
63484f9fd6 feat: add thinking_budget (#6089) 2025-05-09 08:22:09 -07:00
Lianmin Zheng
177320a582 Clean up imports (#5467) 2025-04-16 15:26:49 -07:00
JieXin Liang
1a3fa75f2f [Fix] use torch.cat instead of torch.concat to prevent entering the Autograd backends. (#4466) 2025-03-16 00:02:47 -07:00
Wang Ran (汪然)
2892b9bb97 bugfix: Update sampling_params.py (#4413) 2025-03-15 16:39:19 -07:00
Lianmin Zheng
d4017a6b63 [EAGLE] many fixes for eagle (#4195)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Sehoon Kim <sehoon@x.ai>
2025-03-07 22:12:13 -08:00
Qiaolin Yu
57a404fd55 Remove outdated test utils and fix links for the doc of sampling params (#3999) 2025-03-03 09:41:38 -08:00
Lianmin Zheng
935cda944b Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
Qiaolin Yu
40782f05d7 Refactor: Move return_hidden_states to the generate input (#3985)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
2025-03-01 17:51:29 -08:00
mlmz
bac414ab53 [Feature] integrate Structural Tag in xgrammar backend for function calling (#3566)
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
2025-02-27 23:33:41 -08:00
Qiaolin Yu
d38878523d Fix the doc link for sampling params (#3861) 2025-02-27 13:31:43 -08:00
Qiaolin Yu
d6898dd253 Add return hidden state in the native API (#3897)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 22:06:54 -08:00
Lianmin Zheng
4f118a39d7 Fix repetition penalty (#3139) 2025-01-25 21:48:58 -08:00
Lianmin Zheng
27acf63bbd Use torch.compile for scaling penalty (#3133) 2025-01-25 18:27:33 -08:00
Hongpeng Guo
583697cd71 [Enhancement] Custom Logit Processor Improvement (#2998)
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
2025-01-20 02:00:35 -08:00
Hongpeng Guo
e403d23757 [Feature] Add sampler custom logits processor (#2396)
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
2025-01-19 14:46:53 -08:00
Xiaoyu Zhang
d08c77c434 Sampling penalties memory interface (#2870) 2025-01-13 23:09:00 +08:00
Yineng Zhang
41d7e5b7e6 docs: update link (#2857) 2025-01-13 18:40:48 +08:00
Lianmin Zheng
bdc1acf6cd Misc fix for min_p_sampling, --cuda-graph-bs (#2761) 2025-01-07 02:52:53 -08:00
Lianmin Zheng
21ec66e59e Minor follow-up fixes for the logprob refactor (#2670) 2024-12-30 05:42:08 -08:00
Lianmin Zheng
9c6ba2484f Refactor logprob computation to return the real logprob used in sampling (#2664) 2024-12-30 04:51:38 -08:00
Lianmin Zheng
8c3b420eec [Docs] clean up structured outputs docs (#2654) 2024-12-29 23:57:16 -08:00
Adarsh Shirawalmath
acb340728c [Feature] Support new parameter - EBNF in xgrammar (#2526) 2024-12-26 05:12:41 -08:00
Lianmin Zheng
0e7409adb6 Fix the overlap for xgrammar (#2377) 2024-12-06 05:49:29 -08:00
Yixin Dong
538fa0ae13 [Fix] Avoid calling fill_vocab_mask for terminated requests (#2175) 2024-11-25 17:31:25 +08:00
Xuehai Pan
62a4a339eb docs: fix module docstrings and copyright headers (#2077) 2024-11-22 22:16:53 +08:00
Lianmin Zheng
7d671e4ad2 Enable overlap by default (#2067) 2024-11-19 22:07:58 -08:00
Lianmin Zheng
ffd20fcd03 Make constrained decoding work for overlap scheduler (#2095) 2024-11-19 15:04:43 -08:00
Lianmin Zheng
b110453802 Simplify logits penalizer (#2086) 2024-11-18 17:48:28 -08:00
DarkSharpness
9c745d078e [Performance] Update xgrammar-related constrained decoding (#2056) 2024-11-17 16:58:49 -08:00
Lianmin Zheng
edad373135 Fix illegal memory access in overlap mode & Use more fused triton kernels for building meta data (#2051) 2024-11-16 16:14:23 -08:00
Lianmin Zheng
ea53c63bad Expose no_stop_trim and skip_special_tokens in openai api (#2039) 2024-11-14 19:09:21 -08:00
Lianmin Zheng
ba069a24d3 Fix grammar backend (#2018) 2024-11-12 21:17:38 -08:00
Lianmin Zheng
8dc84da084 Remove the useless to_srt_kwargs (#1955) 2024-11-07 23:15:08 -08:00
DarkSharpness
b77a02cdfd [Performance] Support both xgrammar and outlines for constrained decoding (#1752) 2024-10-25 21:47:02 +00:00
Lianmin Zheng
8f8f96a621 Fix the perf regression due to additional_stop_token_ids (#1773) 2024-10-23 16:45:21 -07:00
Lianmin Zheng
0d800090b4 Fix missing additional_stop_token_ids (#1769) 2024-10-23 12:18:59 -07:00