Commit Graph

72 Commits

Author SHA1 Message Date
Yuan Luo
3b9d97f335 perf: optimize qwen-vl with symm mem allreduce (#11381)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-10-10 22:24:45 +08:00
Lzhang-hub
d42975c641 Remove duplicate code in qwen2 model (#10540) 2025-09-24 02:40:51 +08:00
Yichen Yan
4f9e71df3c Remove duplicated code (#10545)
Co-authored-by: jiapingW <56055330+jiapingW@users.noreply.github.com>
2025-09-16 20:48:22 -07:00
Lzhang-hub
37d83c6e6d Qwen2.5-VL eagle3 infer (#8801) 2025-09-07 20:44:34 -07:00
KerwinKai
87a0f7d2c2 [feat] Support EAGLE3 for Qwen2 (#9216) 2025-08-29 12:59:51 -07:00
Cheng Wan
b87aacb5c5 [DP Attention] Refactor: adding some utility functions (#9136) 2025-08-13 21:08:06 -07:00
PGFLMG
b7cd743038 [Feat] QWen-1M context support[2/2]: Update block sparse attention backend (#5949) 2025-08-06 23:49:36 -07:00
Xiaoze Fan
570d33437b [Feature] Layer-wise Prefill (#7634)
Signed-off-by: jason-fxz <jason341132@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-17 01:57:46 +08:00
ronnie_zheng
766392c6bd [feature]Ascend quantization support (#7791)
Co-authored-by: ichernob <ichernobnn@gmail.com>
Co-authored-by: liupeng <liupeng374@huawei.com>
2025-07-10 09:17:37 -07:00
Leng Yue
8364608930 add model: qwen2-audio (#7596) 2025-07-04 21:13:10 -07:00
Ximingwang-09
1964c325de [feat] Support EAGLE3 for Qwen (#7745)
Co-authored-by: 纬杭 <ximing.wxm@antgroup.com>
Co-authored-by: zyksir <zyksir@outlook.com>
2025-07-04 19:50:28 -07:00
Yi Zhang
264dc6e744 [optimize] add two stream norm for qwen3 (#7740)
Co-authored-by: ispobock <ispobaoke@gmail.com>
2025-07-03 09:59:17 -07:00
Yi Zhang
646cef2e2e support qwen3 dense model dp attention (#7681) 2025-07-03 09:58:20 -07:00
Chunyuan WU
1dce6c480f [CPU] support the case where num_attention_heads or intermediate_size is not divisible by the TP size (#6771) 2025-07-03 09:51:38 -07:00
Shenggui Li
3f23d8cdf1 added support for tied weights in qwen pipeline parallelism (#6546) 2025-05-25 00:00:56 -07:00
libra
11553c1a37 Add pipeline parallelism for Qwen2 and Qwen3 Model (#6250) 2025-05-18 00:42:55 -07:00
yhyang201
4db463b1ad [Model] Adding Qwen3 and Qwen3MoE (#4693) 2025-04-18 09:51:29 -07:00
Yun Dai
2695ab0537 Fix loading KV quantization scale; Enable modelopt kv cache (#4686)
Co-authored-by: qingquansong <ustcsqq@gmail.com>
2025-04-08 09:11:35 -07:00
Mick
5cb552b1d4 refactor: multimodal data (#4754) 2025-03-31 09:57:51 -07:00
Mick
11577cedb7 refactor: bug fixes and refactor for vlm (#4661) 2025-03-22 22:48:49 -07:00
yych0745
6f43a9b9f4 remove the unused readline dependency from the Qwen2 model implementa… (#4340) 2025-03-12 02:47:27 -07:00
Qubitium-ModelCloud
56a724eba3 [QUANT] Add GPTQModel Dynamic Quantization + lm_head Quantization (#3790)
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
2025-03-05 01:11:00 -08:00
fzyzcjy
a3339d8cac Bug: Fix weight loader error when LM head weights are tied (#3766) 2025-02-21 17:53:12 -08:00
Chayenne
14d90617b0 Bug: fix lm head weights in Qwen models (#3777) 2025-02-21 16:49:31 -08:00
simveit
20b765a26e Model: Support Qwen 72B RM model. (#3772) 2025-02-21 14:38:21 -08:00
Mick
9f635ea50d [Fix] Address remaining issues of supporting MiniCPMV (#2977) 2025-01-28 00:22:13 -08:00
Mick
3d93f84a00 [Feature] Support minicpmv v2.6 (#2785)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-01-18 14:14:19 -08:00
Yineng Zhang
2add697d7a feat: remove vllm get_rope (#2964) 2025-01-18 19:38:01 +08:00
bjmsong
d3024f4fc8 support e4m3 kvcache in qwen2 & add kv scaling facotr json (#2894)
Co-authored-by: bjmsong <bjmsong@126.com>
2025-01-18 11:43:22 +08:00
Ke Bao
53e6552fed Fix qwen accuracy issue (#2945) 2025-01-17 22:35:26 +08:00
Yineng Zhang
5dc54f1a62 feat: remove vllm distributed (#2907)
Co-authored-by: Zhangyi <1109276519@qq.com>
2025-01-17 22:31:51 +08:00
Rin Intachuen
a2f602b541 fixed lm_head.weight error for quantized qwen (#2910) 2025-01-16 06:51:43 -08:00
Lzhang-hub
6ec75e626d add qwen2 eagle model (#2863) 2025-01-13 05:29:33 -08:00
Lianmin Zheng
959735fc9e Fix model loader for more quantization formats (#2448) 2024-12-11 05:21:23 -08:00
Yineng Zhang
85e1a6f3aa Update model_loader deps and qqq quantization deps (#2220) (#2318)
Co-authored-by: HandH1998 <1335248067@qq.com>
2024-12-02 23:22:13 +08:00
Lianmin Zheng
4936be8acc Revert "Revert "[FEAT] Support GGUF format"" (#2287) 2024-11-30 22:14:48 -08:00
Lianmin Zheng
7e4c6dd8da Revert "[FEAT] Support GGUF format" (#2285) 2024-11-30 19:03:26 -08:00
Yang Zheng
883c955489 [FEAT] Support GGUF format (#2215)
Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>
2024-11-30 00:44:48 -08:00
Ying Sheng
8b48496aaf Revert "Revert "Add simple CPU offloading support"" (#2253)
Co-authored-by: Jani Monoses <jani.monoses@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-28 23:58:54 -08:00
Ying Sheng
4057ea82c9 Revert "Add simple CPU offloading support" (#2252)
We'll re-add the commit to correctly ack Kaichao's authorship
2024-11-28 23:36:55 -08:00
Jani Monoses
d98fa1e93d Add simple CPU offloading support. (#2081) 2024-11-23 06:23:53 +00:00
Xuehai Pan
62a4a339eb docs: fix module docstrings and copyright headers (#2077) 2024-11-22 22:16:53 +08:00
Ke Bao
16eb33ffe2 Update vocab embedding deps and add TP switch (#1856) 2024-10-31 20:13:07 -07:00
Byron Hsu
56503d9bc9 [1/N] Remove CacheConfig import in all model files (#1658) 2024-10-14 09:06:34 -07:00
Lianmin Zheng
36d5acfca5 Rename InputMetadata -> ForwardBatch (#1543) 2024-09-30 02:41:11 -07:00
Yineng Zhang
b4408b0d16 feat: update linear deps 1/N (#1305) 2024-09-19 20:53:11 +08:00
Liangsheng Yin
70b6802982 Optimize conflicts between CUDA graph and vocab mask tensors (#1392) 2024-09-13 20:27:53 -07:00
Liangsheng Yin
381dd57bd6 Sampler cudagraph (#1253) 2024-08-28 18:58:52 -07:00
Lianmin Zheng
bf53bf5142 [Fix] Fix llava on multi images (#1247) 2024-08-28 06:33:05 -07:00
Yineng Zhang
f25f4dfde5 hotfix: revert sampler CUDA Graph (#1242) 2024-08-28 21:16:47 +10:00