Commit Graph

4 Commits

Author SHA1 Message Date
zouyida2052
ba9714ccee Optimize qwen2_vl and qwen2_5_vl (#701)
### What this PR does / why we need it?
Optimize qwen2_vl and qwen2_5_vl.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
Testing this PR on 1080p picture with tp=1, bs=1 on Qwen2-VL and
Qwen2.5-VL, every fa op's during time lasting from 11ms to 9ms, got
roughly 22% perf boost.

---------

Signed-off-by: zouyida2052 <zouyida@huawei.com>
Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
Co-authored-by: zouyida2052 <zouyida@huawei.com>
2025-04-30 14:22:38 +08:00
Pleaplusone
1a1f9a6d89 port deepseekv2 and mtp to main branch (#429)
### What this PR does / why we need it?
This PR ports all the deepseek graph mode code and mtp code from v0.7.3
to the main branch
---------

Signed-off-by: SidaoY <1024863041@qq.com>
Signed-off-by: linfeng-yuan <1102311262@qq.com>
Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
Signed-off-by: mengwei805 <mengwei25@huawei.com>
Signed-off-by: libaokui <libaokui@huawei.com>
Signed-off-by: q00832892 <qiaoyang19@huawei.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Co-authored-by: SidaoY <1024863041@qq.com>
Co-authored-by: linfeng-yuan <1102311262@qq.com>
Co-authored-by: Yizhou Liu <liuyizhou5@h-partners.com>
Co-authored-by: mengwei805 <mengwei25@huawei.com>
Co-authored-by: libaokui <libaokui@huawei.com>
2025-04-19 17:38:18 +08:00
Mengqing Cao
344228a5da [deepseek][bugfix] support deepseek quant (#469)
- support deepseek quant
  - add w8a8_dynamic quant
see #391

Signed-off-by: MengqingCao <cmq0113@163.com>
Co-authored-by: zzzzwwjj <1183291235@qq.com>
2025-04-07 10:56:12 +08:00
zouyida2002
faf8cd89cb register qwen2_vl to rewrite qwen2_vl forwad (#241)
Add qwen2-vl ascend impletation.

---------
Signed-off-by: zouyida <zouyida@huawei.com>
2025-03-07 15:41:47 +08:00