fzyzcjy
|
fa2f677e18
|
Fix torch memory saver not enabled in DP scenario (#5560)
|
2025-04-20 14:20:52 -07:00 |
|
Xiaoyu Zhang
|
d58e354472
|
simplify the control logic for using shared experts fusion (#5504)
|
2025-04-19 13:17:35 -07:00 |
|
fzyzcjy
|
f6a71139a8
|
Make profiler output file names consistent (#5548)
|
2025-04-18 22:57:11 -07:00 |
|
fzyzcjy
|
53dcf38876
|
Introduce moe_dense_tp_size to fix dense layer errors in DeepSeek V3 + 4x8xH100 (#4836)
|
2025-04-17 21:38:26 -07:00 |
|
Baizhou Zhang
|
6fb29ffd9e
|
Deprecate enable-flashinfer-mla and enable-flashmla (#5480)
|
2025-04-17 01:43:33 -07:00 |
|
Baizhou Zhang
|
4fb05583ef
|
Deprecate disable-mla (#5481)
|
2025-04-17 01:43:14 -07:00 |
|
Lianmin Zheng
|
177320a582
|
Clean up imports (#5467)
|
2025-04-16 15:26:49 -07:00 |
|
Cheng Wan
|
6aca583420
|
Fix several minor issues in PD disaggregation (#5444)
|
2025-04-15 23:04:41 -07:00 |
|
Baizhou Zhang
|
a42736bbb8
|
Support MHA with chunked prefix cache for DeepSeek chunked prefill (#5113)
|
2025-04-15 22:01:22 -07:00 |
|
ybyang
|
dd83e7e9c3
|
[Bug fix] need record start time in pd mode (#5425)
|
2025-04-16 10:11:16 +08:00 |
|
shangmingc
|
ffde65a094
|
[PD] Fix dynamic port support and MLA buffer for Mooncake (#5415)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-04-15 19:29:31 +08:00 |
|
Byron Hsu
|
a9499885e9
|
[PD] Add transfer backend abstraction (#5328)
|
2025-04-14 01:39:39 +08:00 |
|
Liangsheng Yin
|
f765579046
|
Fix typo: infight -> inflight (#5357)
|
2025-04-14 01:25:30 +08:00 |
|
tianlian yi
|
bc92107b03
|
Support server based rollout in Verlengine (#4848)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Jinn <47354855+jhinpan@users.noreply.github.com>
|
2025-04-12 10:07:52 -07:00 |
|
Mick
|
34ef6c8135
|
[VLM] Adopt fast image processor by default (#5065)
|
2025-04-11 21:46:58 -07:00 |
|
Mick
|
e53a0b3d5b
|
[fix] fix mrope positions not picked up (#5265)
|
2025-04-11 01:29:45 -07:00 |
|
Cheng Wan
|
038bc5d521
|
Support --enable-llama4-multimodal (#5254)
|
2025-04-11 01:24:14 -07:00 |
|
Ke Bao
|
1078396f47
|
Update deps for mllama4 (#5215)
|
2025-04-10 09:12:44 -07:00 |
|
Teng Ma
|
4c31ae9f6d
|
[PD] Support KV transfer with mooncake (#4880)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Co-authored-by: shangmingc <csmthu@gmail.com>
|
2025-04-10 14:23:23 +08:00 |
|
Stefan He
|
5db37c8626
|
[metrics] Add in queue metrics (#4444)
|
2025-04-09 17:19:27 -07:00 |
|
Mick
|
fbebcb7aa4
|
model: support mllama4 (#5144)
|
2025-04-09 09:28:44 -07:00 |
|
fzyzcjy
|
61970b08d8
|
Let bench_one_batch support enable_dp_attention (#4058)
|
2025-04-08 23:44:25 -07:00 |
|
fzyzcjy
|
466899e69c
|
Fix multimodal hashing error (#5174)
|
2025-04-08 18:42:26 -07:00 |
|
XinyuanTong
|
d09a51f1f6
|
[feat&refactor] Enhance multimodal input support with refactor io_struct (#4938)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-04-08 14:48:07 -07:00 |
|
huangtingwei
|
27f8e6b9c1
|
fix multimodal hash feature (#5083)
|
2025-04-07 22:43:23 -07:00 |
|
mlmz
|
7c5658c189
|
feat: disable grammar restrictions within reasoning sections (#4984)
Co-authored-by: tianhaoyu <thy@mail.ecust.edu.cn>
Co-authored-by: DarkSharpness <2040703891@qq.com>
|
2025-04-07 21:46:47 -07:00 |
|
Chang Su
|
f04c80dc42
|
Add Llama4 support (#5092)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: ispobock <ispobaoke@163.com>
|
2025-04-07 00:29:36 -07:00 |
|
Baizhou Zhang
|
efbae697b3
|
[Revision] Replace enable_flashinfer_mla argument with attention_backend (#5052)
|
2025-04-05 01:23:02 -07:00 |
|
Xiaoyu Zhang
|
924ca7c92c
|
Add DeepSeek V3/R1 shared experts fusion (#4918)
|
2025-04-04 01:59:29 -07:00 |
|
Ravi Theja
|
69df9761dd
|
Add LlavaLlamaForCausaLM in MultiModal Processors (#5039)
Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>
|
2025-04-03 15:41:12 -07:00 |
|
Lianmin Zheng
|
74885a848b
|
Revert "Replace enable_flashinfer_mla argument with attention_backend" (#5048)
|
2025-04-03 13:30:56 -07:00 |
|
Baizhou Zhang
|
e8999b13b7
|
Replace enable_flashinfer_mla argument with attention_backend (#5005)
|
2025-04-03 02:53:58 -07:00 |
|
Kaiyu Yang
|
31da75abed
|
Update tokenizer_manager.py (#5008)
|
2025-04-02 13:56:19 -07:00 |
|
Zhiqiang Xie
|
e119f04215
|
Large page size aligned hierarchical caching (#4581)
|
2025-04-01 22:38:15 -07:00 |
|
XinyuanTong
|
9eb49e878b
|
[VLM RLHF] Take Image input for verl vlm rollout (#4915)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: GeLee <leege233@gmail.com>
|
2025-04-01 20:03:17 -07:00 |
|
Zhiqiang Xie
|
12047f5e94
|
Prevent memory leak of retract_decode when page_size > 1 (#4977)
|
2025-04-01 15:30:45 -07:00 |
|
Jinyan Chen
|
23c764b18a
|
[Feature] Support DeepEP Low Latency (#4767)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
Co-authored-by: ch-wan <cwan39@gatech.edu>
|
2025-04-01 09:23:25 -07:00 |
|
Mick
|
5cb552b1d4
|
refactor: multimodal data (#4754)
|
2025-03-31 09:57:51 -07:00 |
|
Zhiqiang Xie
|
a169b9f813
|
Fix oom error for large page size (#4913)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-03-30 21:34:21 -07:00 |
|
Baizhou Zhang
|
e62d60fe6d
|
[Fix] avoid stream sync and torch compile in prefill for fa3 backend (#4932)
|
2025-03-30 13:53:44 -07:00 |
|
Lianmin Zheng
|
4ede6770cd
|
Fix retract for page size > 1 (#4914)
|
2025-03-30 02:57:15 -07:00 |
|
Lianmin Zheng
|
b26bc86b36
|
Support page size > 1 + eagle (#4908)
|
2025-03-30 00:46:23 -07:00 |
|
fzyzcjy
|
b1cfb4e972
|
Fix BadRequestError wrong arguments and remove openai dependency (#4882)
|
2025-03-29 08:16:21 -07:00 |
|
Fr4nk1in
|
c483377ed7
|
Fix wrong variable name when stopping memory profile (#4772)
|
2025-03-28 10:35:02 -07:00 |
|
Lianmin Zheng
|
74e0ac1dbd
|
Clean up import vllm in quantization/__init__.py (#4834)
|
2025-03-28 10:34:10 -07:00 |
|
fzyzcjy
|
8c04f0f2e1
|
Support with_stack and record_shapes in profiler (#4740)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-03-27 23:01:42 -07:00 |
|
fzyzcjy
|
265e756494
|
Super tiny remove unused code (#4750)
|
2025-03-27 22:32:14 -07:00 |
|
fzyzcjy
|
53a2c3b466
|
Support controlling nsys start and end range programmatically (#4688)
|
2025-03-27 22:21:13 -07:00 |
|
XinyuanTong
|
42a45df043
|
[Fix] self.worker assignment in TpModelWorker and refactor references (#4788)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-03-27 20:28:38 -07:00 |
|
tarinkk
|
7f19e083c1
|
Support (1 <= dp < tp) in the dp attention in DeepEP (#4770)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
|
2025-03-27 17:09:35 -07:00 |
|