sglang

Author	SHA1	Message	Date
Lianmin Zheng	f68dd998b9	Rename customer label -> custom label (#10899 ) Co-authored-by: Yingchun Lai <laiyingchun@apache.org> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-25 16:19:53 -07:00
Xinyuan Tong	71f24ef8f6	feat: add cache_salt support to request (#10718 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-09-23 23:30:25 -07:00
Lianmin Zheng	38c00ed7a1	Fix multimodal registry and code sync scripts (#10759 ) Co-authored-by: cctry <shiyang@x.ai>	2025-09-22 15:36:01 -07:00
Zhihao Zhang	e7bc600304	[Feature] Speculative decoding support lookahead (#9873 ) Co-authored-by: a4zhangfei <a4zhangfei@qq.com> Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>	2025-09-18 16:42:41 -07:00
harrisonlimh	14fdd52740	feat: add priority based scheduling with priority based request acceptance and preemption (#8746 )	2025-09-16 17:10:10 -07:00
Yingchun Lai	fc2c3a3d8e	metrics: support customer labels specified in request header (#10143 )	2025-09-14 20:00:08 -07:00
Liangsheng Yin	305c9e8c2d	[4/N]DP refactor: support watching mode `get_load` and shortest queue strategy (#10201 )	2025-09-15 10:06:08 +08:00
Feng Su	4c21b09074	[Feature] Sglang Tracing: Fine-Grained Tracking for Request Latency - Part 1 (#9962 ) Signed-off-by: Feng Su <sufeng@linux.alibaba.com> Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com> Signed-off-by: Peng Wang <rocking@linux.alibaba.com>	2025-09-15 02:08:02 +08:00
艾力可	165abeebca	Typo: in `--enable-custom-logit-processor`: agree with cli arg (#10076 )	2025-09-14 02:27:09 -07:00
Sundara Raman Ramachandran	a360511d7b	[Generative Score API] Scoring(Prefill-only) optimizations. (#9748 )	2025-09-14 01:57:06 +08:00
Sundara Raman Ramachandran	94d0f656fb	[Performance] Dynamic Batch Tokenizer (#9382 )	2025-09-14 01:56:04 +08:00
Liangsheng Yin	78f139812a	[1/N] DP-Refactor: move communicators into `tokenizer_communicator_mixin` (#10028 )	2025-09-08 16:27:37 +08:00
Liangsheng Yin	e719bb0e84	[1/2] Refactor multi-tokenizer manager (#10074 )	2025-09-07 19:13:34 +08:00
Jimmy	f40038fb09	[Vulnerability]feat(conn): set bootstrap server host (#9931 )	2025-09-05 17:36:17 +08:00
Huang Long	f98366604b	fix MultiTokenizerWrapper name (#10049 ) Signed-off-by: huanglong <huanglong@linux.alibaba.com>	2025-09-05 13:39:46 +08:00
Yingchun Lai	b32ab0705e	metrics: support customer buckets for prompt/generation_tokens_histogram (#9634 )	2025-09-04 22:22:08 +08:00
ybyang	5f77e1292d	Support Multi Process Tokenizer Manager(#6555 ) (#8964 ) Signed-off-by: ybyang <ybyang7@iflytek.com> Signed-off-by: huanglong <huanglong@linux.alibaba.com> Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com> Co-authored-by: huanglong <huanglong@linux.alibaba.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-09-01 01:00:13 -07:00
Liangsheng Yin	6d3c20cf5b	fix `set_interal_state` API (#9850 )	2025-09-01 01:31:35 +08:00
Teng Ma	f05c68733e	[HiCache] Clear kvcache in storage backend with fastAPI (#9750 ) Co-authored-by: hzh0425 <hzh0425@apache.org>	2025-08-31 17:41:44 +08:00
Sundara Raman Ramachandran	ea0696b924	[Performance] Batch Send from Tokenizer Manager. (#9436 )	2025-08-26 01:43:54 +08:00
Chanh Nguyen	127d4b0d5e	Support GC Freezing to improve latency & throughput (#9241 ) Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2025-08-23 13:43:09 +08:00
Liangsheng Yin	9b5f0f64f5	Fix tiny misalign with previous truncation setting in tokenizer_manager (#9430 )	2025-08-21 14:05:35 +08:00
Liangsheng Yin	eb19ccadae	[bug] fix errors related to context length in SD (#9388 )	2025-08-21 10:32:34 +08:00
Lifu Huang	b0980af89f	Support pinning adapter via server args. (#9249 )	2025-08-20 16:25:01 -07:00
Liangsheng Yin	08ebdf79d0	Fix the `--allow-auto-truncate` argument in tokenizer manager. (#9391 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-20 16:56:47 +08:00
datdo-msft	98b44e9e56	[PD] Propagate internal server errors from aborted requests to clients instead of blindly returning 200's (#8936 )	2025-08-18 14:23:46 -07:00
Chengxing Xie	c1c7dc4534	feat: Add model version tracking with API endpoints and response metadata (#8795 )	2025-08-14 12:13:46 -07:00
Sundara Raman Ramachandran	a027a9b4b3	[Generative Score API] Optimization to Remove Decode. (#8840 )	2025-08-14 05:12:24 +08:00
Lifu Huang	5ded39cab2	Fix race condition in async lora unload (#9084 )	2025-08-11 22:59:29 -07:00
Lianmin Zheng	4ea9d74a3e	Simplify health check (#9034 )	2025-08-10 17:35:05 -07:00
Lianmin Zheng	a947154286	Revert "Support Multi Process Tokenizer Manager" (#8960 )	2025-08-08 02:28:27 -07:00
ybyang	7490e3f67d	Support Multi Process Tokenizer Manager (#6555 ) Signed-off-by: ybyang <ybyang7@iflytek.com> Signed-off-by: huanglong <huanglong@linux.alibaba.com> Co-authored-by: lw9527 <952799980@qq.com> Co-authored-by: huanglong <huanglong@linux.alibaba.com> Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com>	2025-08-08 01:45:50 -07:00
Lifu Huang	6210e2c4f0	Support GPU pinning for LoRA (#8697 )	2025-08-06 19:39:45 -07:00
Chang Su	92cc32d9fc	Support v1/responses and use harmony in serving_chat (#8837 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-06 16:20:34 -07:00
Baizhou Zhang	f2d68ded6d	Rename lora_path to lora_id in batches (#8437 )	2025-08-03 21:08:28 -07:00
ybyang	6f9baf1002	[Improvements] Merge health check route (#8444 ) Signed-off-by: ybyang <ybyang7@iflytek.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Kan Wu <wukanustc@gmail.com>	2025-08-03 01:59:06 -07:00
Lifu Huang	8675bdf246	Support limiting max loaded loras in CPU. (#8650 )	2025-08-03 00:02:23 -07:00
Wenchen Lo	ea93079b30	model: adapt mllama4 to VisionAttention (#8512 ) Co-authored-by: root <mickjagger19@icloud.com>	2025-08-02 00:39:40 -07:00
Xinyuan Tong	7e831efee8	Fix chat template handling for OpenAI serving (#8635 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-07-31 21:49:45 -07:00
Lianmin Zheng	a4c3b121d8	Split the scheduler into multiple mixin classes to reduce the file size (#8483 )	2025-07-29 12:46:50 -07:00
fzyzcjy	0ce84c822b	Support colocating requests (#7973 )	2025-07-28 22:51:49 -07:00
harrisonlimh	747dd45077	feat: throttle requests at scheduler based on --max_queued_requests (#7565 )	2025-07-28 22:32:33 +08:00
Lifu Huang	df90645525	Support overlapped lora updates (#8213 )	2025-07-27 13:00:44 -07:00
Mick	3212c2ad3f	vlm: optimize tensor transport (#6003 ) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-07-26 17:41:01 +08:00
Lifu Huang	8abd3e77fe	Introduce Stable LoRA ID System for Overlapped Updates and Prefix Caching (#8261 )	2025-07-23 00:32:16 -07:00
Lianmin Zheng	55381a46ac	Revert "[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability" (#8181 )	2025-07-19 22:41:30 -07:00
ybyang	4540a4666a	[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability (#8115 ) Signed-off-by: ybyang <ybyang7@iflytek.com>	2025-07-19 18:10:00 -07:00
Lifu Huang	4e3defe5a7	Support start up LoRA server without initial adapters (#8019 )	2025-07-19 15:38:09 -07:00
Yingchun Lai	610381b75e	[health_generate] fix: fix the /health_generate always success bug (#8028 )	2025-07-18 22:08:46 -07:00
ehuaa	0c55cbcfc5	[BugFix] add verify logit_bias to avoid crash because of IndexError (#7749 )	2025-07-14 02:44:12 +08:00

1 2 3 4 5 ...

267 Commits