Commit Graph

230 Commits

Author SHA1 Message Date
Zhiqiang Xie
70645f4d7d upstream hicache fixes (#5570) 2025-04-20 23:08:30 -07:00
fzyzcjy
1195182040 Tiny add Engine.flush_cache API (#5241) 2025-04-20 18:15:03 -07:00
fzyzcjy
f6a71139a8 Make profiler output file names consistent (#5548) 2025-04-18 22:57:11 -07:00
Cheng Wan
6aca583420 Fix several minor issues in PD disaggregation (#5444) 2025-04-15 23:04:41 -07:00
ybyang
dd83e7e9c3 [Bug fix] need record start time in pd mode (#5425) 2025-04-16 10:11:16 +08:00
shangmingc
ffde65a094 [PD] Fix dynamic port support and MLA buffer for Mooncake (#5415)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: ybyang <ybyang7@iflytek.com>
2025-04-15 19:29:31 +08:00
Byron Hsu
a9499885e9 [PD] Add transfer backend abstraction (#5328) 2025-04-14 01:39:39 +08:00
Liangsheng Yin
f765579046 Fix typo: infight -> inflight (#5357) 2025-04-14 01:25:30 +08:00
Mick
34ef6c8135 [VLM] Adopt fast image processor by default (#5065) 2025-04-11 21:46:58 -07:00
Cheng Wan
038bc5d521 Support --enable-llama4-multimodal (#5254) 2025-04-11 01:24:14 -07:00
Ke Bao
1078396f47 Update deps for mllama4 (#5215) 2025-04-10 09:12:44 -07:00
Teng Ma
4c31ae9f6d [PD] Support KV transfer with mooncake (#4880)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Co-authored-by: shangmingc <csmthu@gmail.com>
2025-04-10 14:23:23 +08:00
Stefan He
5db37c8626 [metrics] Add in queue metrics (#4444) 2025-04-09 17:19:27 -07:00
fzyzcjy
61970b08d8 Let bench_one_batch support enable_dp_attention (#4058) 2025-04-08 23:44:25 -07:00
mlmz
7c5658c189 feat: disable grammar restrictions within reasoning sections (#4984)
Co-authored-by: tianhaoyu <thy@mail.ecust.edu.cn>
Co-authored-by: DarkSharpness <2040703891@qq.com>
2025-04-07 21:46:47 -07:00
Zhiqiang Xie
e119f04215 Large page size aligned hierarchical caching (#4581) 2025-04-01 22:38:15 -07:00
Mick
5cb552b1d4 refactor: multimodal data (#4754) 2025-03-31 09:57:51 -07:00
Lianmin Zheng
b26bc86b36 Support page size > 1 + eagle (#4908) 2025-03-30 00:46:23 -07:00
Fr4nk1in
c483377ed7 Fix wrong variable name when stopping memory profile (#4772) 2025-03-28 10:35:02 -07:00
fzyzcjy
8c04f0f2e1 Support with_stack and record_shapes in profiler (#4740)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-03-27 23:01:42 -07:00
fzyzcjy
53a2c3b466 Support controlling nsys start and end range programmatically (#4688) 2025-03-27 22:21:13 -07:00
XinyuanTong
42a45df043 [Fix] self.worker assignment in TpModelWorker and refactor references (#4788)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-03-27 20:28:38 -07:00
tarinkk
7f19e083c1 Support (1 <= dp < tp) in the dp attention in DeepEP (#4770)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
2025-03-27 17:09:35 -07:00
Xiaoyu Zhang
04e3ff6975 Support compressed tensors fp8w8a8 (#4743) 2025-03-26 13:21:25 -07:00
fzyzcjy
26f07294f1 Warn users when release_memory_occupation is called without memory saver enabled (#4566) 2025-03-26 00:18:14 -07:00
fzyzcjy
eb934bdf3b Fix test_expert_distribution failure (#4752) 2025-03-25 01:17:03 -07:00
yuhsaun-t
199bb01d00 Add endpoints to dump selected expert ids (#4435)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
2025-03-24 21:34:19 -07:00
Mick
1e86457c90 model: Minicpmo (#3023) 2025-03-24 20:08:40 -07:00
Byron Hsu
c7c7dbebbe [PD] Release initial code (#4654)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Ying1123 <sqy1415@gmail.com>
Co-authored-by: merrymercy <lianminzheng@gmail.com>
Co-authored-by: makro
Co-authored-by: dhou-xai
2025-03-21 14:47:47 -07:00
Zhiqiang Xie
a98290aea3 Unit test for Hierarchical Caching (#4486) 2025-03-17 17:45:00 -07:00
Lianmin Zheng
5493c3343e Fix data parallel + tensor parallel (#4499) 2025-03-17 05:13:16 -07:00
JieXin Liang
0212d2e288 [Fix] use torch.inference_mode() instead of torch.no_grad() (#4372) 2025-03-16 22:54:16 -07:00
Ying Sheng
1b859295f4 [Eagle] Remove the greedy branch and some redundant code (#4363)
Co-authored-by: Sehoon Kim <sehoon@x.ai>
2025-03-16 02:48:55 -07:00
wangyu
1ce4878d31 feat(remote_model): support variable remote backend for model loader (#3964)
Signed-off-by: wangyu <wangyu.steph@bytedance.com>
2025-03-14 00:40:44 -07:00
Zhiqiang Xie
fbdb50501f Hot fix for hicache with new page aligned radixtree (#4397) 2025-03-13 15:50:49 -07:00
Lianmin Zheng
8e66fbecee Improve DP attention (#4390)
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-03-13 08:23:56 -07:00
Lianmin Zheng
c76040e31b Support page size > 1 (#4356) 2025-03-12 22:22:39 -07:00
文峰
c550e52f8b Fix scheduler proctitle suffix is ​​None (#4326)
Co-authored-by: wenfeng.wf <wenfeng.wf@alibaba-inc.com>
2025-03-12 19:29:35 -07:00
Lianmin Zheng
e35a93fa8a Move output processing logic from scheduler.py into a separate file (#4354) 2025-03-12 16:21:49 -07:00
Zhiqiang Xie
10b544ae9b Hierarchical Caching Refactoring and Fixing TP issue (#4082) 2025-03-12 11:22:35 -07:00
Lianmin Zheng
d4017a6b63 [EAGLE] many fixes for eagle (#4195)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Sehoon Kim <sehoon@x.ai>
2025-03-07 22:12:13 -08:00
Ke Bao
20c8119915 Fix eagle hang issue for max_new_tokens=1 (#4185) 2025-03-07 12:11:18 -08:00
Pan Lyu
361971b859 Add Support for Qwen2-VL Multi-modal Embedding Models (#3694) 2025-03-06 16:46:20 -08:00
Lianmin Zheng
fcc2e37f69 Split the __init__ of scheduler as smaller functions. Improve the eagle tests (#4128) 2025-03-06 00:13:20 -08:00
Lianmin Zheng
286e6540a6 Remove prefill-only-one-req (#4117) 2025-03-05 20:58:48 -08:00
Wenxuan Tan
718c391fd7 [Hoxfix] Fix incomplete token_to_kv_pool refactor (#4121) 2025-03-05 19:32:42 -08:00
Ying Sheng
d3d4d76758 [Eagle] Refactor eagle speculative decoding (#3986)
Co-authored-by: Ke Bao <ISPObaoke@163.com>
2025-03-05 08:06:07 -08:00
Mick
583d6af71b example: add vlm to token in & out example (#3941)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-04 22:18:26 -08:00
Lianmin Zheng
77a3954bf7 Simplify eagle tests and TP sync in grammar backend (#4066) 2025-03-04 13:40:40 -08:00
Lianmin Zheng
935cda944b Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00