sglang

Author	SHA1	Message	Date
huangtingwei	cb9e0e4180	[HiCacheStorage] fix abort request host memory leaks (#9874 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-09-01 18:59:29 -07:00
ybyang	5f77e1292d	Support Multi Process Tokenizer Manager(#6555 ) (#8964 ) Signed-off-by: ybyang <ybyang7@iflytek.com> Signed-off-by: huanglong <huanglong@linux.alibaba.com> Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com> Co-authored-by: huanglong <huanglong@linux.alibaba.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-09-01 01:00:13 -07:00
Liangsheng Yin	6d3c20cf5b	fix `set_interal_state` API (#9850 )	2025-09-01 01:31:35 +08:00
Zhiqiang Xie	8b6966d020	[HiCache] Storage Refactoring (#9797 ) Co-authored-by: pansicheng <27603155+pansicheng@users.noreply.github.com>	2025-08-31 22:58:21 +08:00
Lianmin Zheng	25c7395934	Fix input logprob index (#9841 ) Co-authored-by: Sheng Shen <sheng.s@berkeley.edu>	2025-08-31 02:56:47 -07:00
Teng Ma	f05c68733e	[HiCache] Clear kvcache in storage backend with fastAPI (#9750 ) Co-authored-by: hzh0425 <hzh0425@apache.org>	2025-08-31 17:41:44 +08:00
VDV1985	ba861293cf	[feat]Ascend NPU Gemma-3-12b and Gemma-3-27b support (#8909 )	2025-08-31 00:25:07 -07:00
Liangsheng Yin	836873b99f	Fix memory leak when aborting decode request in PD-Disagg (#9817 ) Co-authored-by: Lianmin Zheng <15100009+merrymercy@users.noreply.github.com>	2025-08-30 14:36:03 +08:00
Zhiqiang Xie	54e872d343	[HiCache] resolve conflict between chunked-prefill and hicache hit count (#9776 )	2025-08-30 01:30:54 +08:00
wangyu	a38c149758	feat(draft_model): support draft_model for RemoteModelLoader (#6407 ) Signed-off-by: wangyu <wangyu.steph@bytedance.com>	2025-08-28 16:09:52 -07:00
huangtingwei	55349e361d	support mooncake store dp attention (#9684 )	2025-08-28 12:31:31 +08:00
hzh0425	c04c17edfa	refactor(hicache): Introduce generic HiCacheStorageConfig for improved configuration management (#9555 ) Co-authored-by: Teng Ma <805522925@qq.com>	2025-08-26 17:55:20 -07:00
Zhiqiang Xie	43de1d7304	HiCache Storage fix host memory leak (#9648 )	2025-08-26 10:49:40 -07:00
hzh0425	79ce3688bb	BugFix(hicache): Fix host indices out of bound error (#9637 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-08-26 10:42:23 -07:00
ykwd	80dc76e11a	[Fix] HiCache Bugfix & Mooncake Error Handling Enhance (#8901 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-08-25 19:05:10 -07:00
Jonas	a0a77d937b	Fix Harmony reasoning parser for and auto-separation for gpt-oss models (#9190 ) Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com> Co-authored-by: minleminzui <2969413251@qq.com> Co-authored-by: maocheng23 <maocheng@berkeley.edu> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-25 15:26:26 -07:00
Sundara Raman Ramachandran	ea0696b924	[Performance] Batch Send from Tokenizer Manager. (#9436 )	2025-08-26 01:43:54 +08:00
SCDESPERTATE	b5c6529e17	[PD] Improve disaggregation metrics output: update the metrics to keep reflecting real stats (#7317 )	2025-08-24 23:16:43 -07:00
hzh0425	83871aa12d	feat(hicache): Supports 3fs-hicache compatibility with dp-attention (#9372 )	2025-08-23 02:08:32 -07:00
fzyzcjy	2600fc0d47	Overlapped weight offload (#8034 )	2025-08-23 02:06:46 -07:00
fzyzcjy	0374304a2c	Add enable_flashinfer_mxfp4_bf16_moe for higher precision and slower moe backend (#9004 )	2025-08-23 15:38:40 +08:00
Chanh Nguyen	127d4b0d5e	Support GC Freezing to improve latency & throughput (#9241 ) Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2025-08-23 13:43:09 +08:00
huangtingwei	6078d5fcc0	[HiCacheStorage] backup optimization for MLA model (#8865 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-08-22 18:03:51 +08:00
pansicheng	70cf4abccc	3fs zerocopy (#9109 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-08-22 17:56:38 +08:00
Yongfei Xu	9708d353b7	Support MHA with chunked prefix cache for flashinfer/flashmla backend, support page size > 1 for MHA chunked prefix (#8616 ) Co-authored-by: xuyongfei.xyf <xuyongfei.xyf@antgroup.com>	2025-08-21 18:19:44 -07:00
Xinyuan Tong	6c855db82c	Revert "bugfix: Fix output_ids extraction in detokenizer_manager" (#9467 )	2025-08-21 17:24:25 -07:00
Liangsheng Yin	9b5f0f64f5	Fix tiny misalign with previous truncation setting in tokenizer_manager (#9430 )	2025-08-21 14:05:35 +08:00
Liangsheng Yin	eb19ccadae	[bug] fix errors related to context length in SD (#9388 )	2025-08-21 10:32:34 +08:00
Lifu Huang	d4bce29721	Fix incorrect logic in chat template handling. (#9336 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-20 16:25:36 -07:00
Lifu Huang	b0980af89f	Support pinning adapter via server args. (#9249 )	2025-08-20 16:25:01 -07:00
Liangsheng Yin	08ebdf79d0	Fix the `--allow-auto-truncate` argument in tokenizer manager. (#9391 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-20 16:56:47 +08:00
datdo-msft	98b44e9e56	[PD] Propagate internal server errors from aborted requests to clients instead of blindly returning 200's (#8936 )	2025-08-18 14:23:46 -07:00
Binyao Jiang	66d6be0874	Bug fix: use correct mm_items in embed_mm_inputs (#8893 )	2025-08-16 19:55:56 -07:00
Shangming Cai	384f8ab5ce	[PD] Support PD disaggregation with Prefill PP (#8846 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Signed-off-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: root <huzhiyuan@xiaohongshu.com> Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Francis <38564764+ssssnow@users.noreply.github.com> Co-authored-by: zitto <zhjc1124@gmail.com>	2025-08-16 18:31:31 -07:00
Brayden Zhong	bc938ea13f	Fix DP load for embedding (#9165 )	2025-08-15 23:58:44 -07:00
Trevor Morris	eff4eb3fdd	Add fp4 quantize before all-gather for Flashinfer cutlass MoE DP (max throughput) (#7667 )	2025-08-15 22:08:11 -07:00
Cheng Wan	295895120d	[6/N] MoE Refactor: Cleanup MoE-related configs (#8849 )	2025-08-14 21:14:53 -07:00
Chengxing Xie	c1c7dc4534	feat: Add model version tracking with API endpoints and response metadata (#8795 )	2025-08-14 12:13:46 -07:00
pansicheng	733446dd36	fix io group (#9154 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-08-14 12:46:42 +08:00
Cheng Wan	b87aacb5c5	[DP Attention] Refactor: adding some utility functions (#9136 )	2025-08-13 21:08:06 -07:00
Sundara Raman Ramachandran	a027a9b4b3	[Generative Score API] Optimization to Remove Decode. (#8840 )	2025-08-14 05:12:24 +08:00
Lianmin Zheng	9e426466af	Clean up allocators (#9134 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-13 13:56:04 -07:00
huangtingwei	0edda32001	Support page first layout zero copy for mooncake store (#8651 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-08-12 15:59:26 -07:00
ronnie_zheng	48afa8f14f	[feat] Enable Ascend profiling on SGLang (#8610 ) Co-authored-by: liyou_b <2953090824@qq.com>	2025-08-12 13:28:31 -07:00
Lifu Huang	5ded39cab2	Fix race condition in async lora unload (#9084 )	2025-08-11 22:59:29 -07:00
Zhiqiang Xie	9f78f391ae	HiCache Storage: generate hash when inserting new nodes (#9053 )	2025-08-11 14:18:59 -07:00
Liangsheng Yin	f9afa7dceb	Fix docs for clip max new tokens (#9082 )	2025-08-11 13:15:21 -07:00
Baizhou Zhang	75e6a7cde1	Support radix cache for Lora feature (#7216 )	2025-08-11 10:14:11 -07:00
Chang Su	a6452b7188	bugfix: Fix output_ids extraction in detokenizer_manager (#9047 )	2025-08-11 03:17:32 -07:00
Lianmin Zheng	4ea9d74a3e	Simplify health check (#9034 )	2025-08-10 17:35:05 -07:00

1 2 3 4 5 ...

1018 Commits