Lifu Huang
|
3cf1473a09
|
Use monotonic clock for interval measurement (#6211)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
|
2025-05-17 16:49:18 -07:00 |
|
Lianmin Zheng
|
e07a6977e7
|
Minor improvements of TokenizerManager / health check (#6327)
|
2025-05-15 15:29:25 -07:00 |
|
Cheng Wan
|
b2e95f62b4
|
Fix two issues related to --moe-dense-tp-size=1 (#5657)
Co-authored-by: liusy58 <liusy58@linux.alibaba.com>
Co-authored-by: 颉沆 <xiehang.lsy@alibaba-inc.com>
|
2025-05-12 23:51:39 -07:00 |
|
Lianmin Zheng
|
d18c6b3358
|
Support incremental streaming of logprob/token_ids between scheduler and detokenizer (#6225)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-12 14:33:38 -07:00 |
|
Lianmin Zheng
|
e8e18dcdcc
|
Revert "fix some typos" (#6244)
|
2025-05-12 12:53:26 -07:00 |
|
Ying Sheng
|
bad7c26fdc
|
[PP] Fix init_memory_pool desync & add PP for mixtral (#6223)
|
2025-05-12 12:38:09 -07:00 |
|
applesaucethebun
|
d738ab52f8
|
fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-13 01:42:38 +08:00 |
|
Lianmin Zheng
|
fba8eccd7e
|
Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-12 00:17:33 -07:00 |
|
Lianmin Zheng
|
01bdbf7f80
|
Improve structured outputs: fix race condition, server crash, metrics and style (#6188)
|
2025-05-11 08:36:16 -07:00 |
|
applesaucethebun
|
2ce8793519
|
Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 12:55:00 +08:00 |
|
Lianmin Zheng
|
de167cf5fa
|
Fix request abortion (#6184)
|
2025-05-10 21:54:46 -07:00 |
|
fzyzcjy
|
cef91b1ed7
|
[PD] Add control to slow down a server (#5572)
|
2025-05-08 01:03:08 -07:00 |
|
fzyzcjy
|
b6cf3532b5
|
Tiny refactor ModelConfig.from_server_args (#5219)
|
2025-05-08 01:02:43 -07:00 |
|
Liangsheng Yin
|
a3e4e9bf9e
|
Better PD initialization (#5751)
|
2025-05-07 01:12:57 +08:00 |
|
Zhiqiang Xie
|
f8e460930a
|
Fix prefill OOM error in the case of large page size (#5081)
|
2025-05-05 16:02:55 -07:00 |
|
xm:D
|
3409aaab32
|
Support InternVL3 (#5350)
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-05-01 22:38:59 -07:00 |
|
Ying Sheng
|
11383cec3c
|
[PP] Add pipeline parallelism (#5724)
|
2025-04-30 18:18:07 -07:00 |
|
Chang Su
|
28b26dbf48
|
[Bugfix]: fix missing queue_time_start for requests from grammar_queue (#5696)
|
2025-04-29 17:31:44 -07:00 |
|
Lianmin Zheng
|
3029889cb4
|
Turn on overlap scheduler for multimodal models (#5771)
|
2025-04-27 23:45:09 -07:00 |
|
Liangsheng Yin
|
40d9b8acce
|
Improve overlap scheduling (#5788)
|
2025-04-28 11:19:16 +08:00 |
|
IAN
|
11e27d0926
|
[PD]: Support Muti Prefill in one node (#5704)
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
|
2025-04-26 00:30:47 +08:00 |
|
Liangsheng Yin
|
c55550cbf0
|
[PD] Better logs (#5715)
|
2025-04-25 17:25:45 +08:00 |
|
Byron Hsu
|
bf98d2e377
|
[PD] Support prefill overlap + Ensure no race condition (#5609)
|
2025-04-21 12:12:56 -07:00 |
|
Byron Hsu
|
e65b9f21e3
|
[PD] Support decode overlap schedule (#5608)
|
2025-04-21 12:06:16 -07:00 |
|
Zhiqiang Xie
|
70645f4d7d
|
upstream hicache fixes (#5570)
|
2025-04-20 23:08:30 -07:00 |
|
fzyzcjy
|
1195182040
|
Tiny add Engine.flush_cache API (#5241)
|
2025-04-20 18:15:03 -07:00 |
|
fzyzcjy
|
f6a71139a8
|
Make profiler output file names consistent (#5548)
|
2025-04-18 22:57:11 -07:00 |
|
Cheng Wan
|
6aca583420
|
Fix several minor issues in PD disaggregation (#5444)
|
2025-04-15 23:04:41 -07:00 |
|
ybyang
|
dd83e7e9c3
|
[Bug fix] need record start time in pd mode (#5425)
|
2025-04-16 10:11:16 +08:00 |
|
shangmingc
|
ffde65a094
|
[PD] Fix dynamic port support and MLA buffer for Mooncake (#5415)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-04-15 19:29:31 +08:00 |
|
Byron Hsu
|
a9499885e9
|
[PD] Add transfer backend abstraction (#5328)
|
2025-04-14 01:39:39 +08:00 |
|
Liangsheng Yin
|
f765579046
|
Fix typo: infight -> inflight (#5357)
|
2025-04-14 01:25:30 +08:00 |
|
Mick
|
34ef6c8135
|
[VLM] Adopt fast image processor by default (#5065)
|
2025-04-11 21:46:58 -07:00 |
|
Cheng Wan
|
038bc5d521
|
Support --enable-llama4-multimodal (#5254)
|
2025-04-11 01:24:14 -07:00 |
|
Ke Bao
|
1078396f47
|
Update deps for mllama4 (#5215)
|
2025-04-10 09:12:44 -07:00 |
|
Teng Ma
|
4c31ae9f6d
|
[PD] Support KV transfer with mooncake (#4880)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Co-authored-by: shangmingc <csmthu@gmail.com>
|
2025-04-10 14:23:23 +08:00 |
|
Stefan He
|
5db37c8626
|
[metrics] Add in queue metrics (#4444)
|
2025-04-09 17:19:27 -07:00 |
|
fzyzcjy
|
61970b08d8
|
Let bench_one_batch support enable_dp_attention (#4058)
|
2025-04-08 23:44:25 -07:00 |
|
mlmz
|
7c5658c189
|
feat: disable grammar restrictions within reasoning sections (#4984)
Co-authored-by: tianhaoyu <thy@mail.ecust.edu.cn>
Co-authored-by: DarkSharpness <2040703891@qq.com>
|
2025-04-07 21:46:47 -07:00 |
|
Zhiqiang Xie
|
e119f04215
|
Large page size aligned hierarchical caching (#4581)
|
2025-04-01 22:38:15 -07:00 |
|
Mick
|
5cb552b1d4
|
refactor: multimodal data (#4754)
|
2025-03-31 09:57:51 -07:00 |
|
Lianmin Zheng
|
b26bc86b36
|
Support page size > 1 + eagle (#4908)
|
2025-03-30 00:46:23 -07:00 |
|
Fr4nk1in
|
c483377ed7
|
Fix wrong variable name when stopping memory profile (#4772)
|
2025-03-28 10:35:02 -07:00 |
|
fzyzcjy
|
8c04f0f2e1
|
Support with_stack and record_shapes in profiler (#4740)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-03-27 23:01:42 -07:00 |
|
fzyzcjy
|
53a2c3b466
|
Support controlling nsys start and end range programmatically (#4688)
|
2025-03-27 22:21:13 -07:00 |
|
XinyuanTong
|
42a45df043
|
[Fix] self.worker assignment in TpModelWorker and refactor references (#4788)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-03-27 20:28:38 -07:00 |
|
tarinkk
|
7f19e083c1
|
Support (1 <= dp < tp) in the dp attention in DeepEP (#4770)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
|
2025-03-27 17:09:35 -07:00 |
|
Xiaoyu Zhang
|
04e3ff6975
|
Support compressed tensors fp8w8a8 (#4743)
|
2025-03-26 13:21:25 -07:00 |
|
fzyzcjy
|
26f07294f1
|
Warn users when release_memory_occupation is called without memory saver enabled (#4566)
|
2025-03-26 00:18:14 -07:00 |
|
fzyzcjy
|
eb934bdf3b
|
Fix test_expert_distribution failure (#4752)
|
2025-03-25 01:17:03 -07:00 |
|