Update README.md

2025-02-19 20:53:52 +08:00
parent f08b0e422c
commit a6278efdc9
1 changed files with 33 additions and 12 deletions
--- a/README.md
+++ b/README.md
@@ -47,19 +47,40 @@ We are pleased to announce the release of **Ovis2**, our latest advancement in m
 | Ovis2-34B  |  aimv2-1B-patch14-448   | Qwen2.5-32B-Instruct  | [Huggingface](https://huggingface.co/AIDC-AI/Ovis2-34B) |                            -                             |
 ## Performance
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/637aebed7ce76c3b834cea37/aCuSemmHy_MhrDaBiYfco.png)
+We use [VLMEvalKit](https://github.com/open-compass/VLMEvalKit), as employed in the OpenCompass [multimodal](https://rank.opencompass.org.cn/leaderboard-multimodal) and [reasoning](https://rank.opencompass.org.cn/leaderboard-multimodal-reasoning) leaderboard, to evaluate Ovis2.
-|Benchmark|Ovis2-1B|Ovis2-2B|Ovis2-4B|Ovis2-8B|Ovis2-16B|Ovis2-34B|
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/658a8a837959448ef5500ce5/M1XRFbeNbfe1lEvt9WF-j.png)
-|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+
-|MMBench-V1.1<sub>test</sub>|68.5|77.2|81.4|83.3|85.2|86.2|
+### Image Benchmark
-|MMStar|52.0|59.0|61.7|64.4|66.9|69.4|
+| Benchmark                    | Qwen2.5-VL-72B   | InternVL2.5-38B-MPO   | InternVL2.5-26B-MPO   |   Ovis1.6-27B | LLaVA-OV-72B          | Ovis2-16B   | Ovis2-34B   |
-|MMMU<sub>val</sub>|36.0|45.3|48.0|59.0|59.6|65.6|
+|:-----------------------------|:----------------:|:---------------------:|:---------------------:|:-------------:|:---------------------:|:-----------:|:-----------:|
-|MathVista<sub>testmini</sub>|59.5|64.4|69.1|71.4|74.9|77.0|
+| MMBench-V1.1<sub>test</sub>  | **87.8**         | 85.4                  | 84.2                  |          82.2 | 84.4                  | 85.6        | 86.6        |
-|HallBench<sub>avg</sub>|44.5|50.2|54.0|56.0|55.9|58.8|
+| MMStar                       | **71.1**         | 70.1                  | 67.7                  |          63.5 | 65.8                  | 67.2        | 69.2        |
-|AI2D<sub>test</sub>|76.8|82.6|85.5|86.8|86.1|88.4|
+| MMMU<sub>val</sub>           | **67.9**         | 63.8                  | 56.4                  |          60.3 | 56.6                  | 60.7        | 66.7        |
-|OCRBench|88.7|87.5|91.0|89.3|88.2|89.8|
+| MathVista<sub>testmini</sub> | 70.8             | 73.6                  | 71.5                  |          70.2 | 68.4                  | 73.7        | **76.1**    |
-|MMVet|50.3|58.6|65.5|68.5|68.4|75.5|
+| HallusionBench               | 58.8             | **59.7**              | 52.4                  |          54.1 | 47.9                  | 56.8        | 58.8        |
-|Average|59.5|65.6|69.5|72.3|73.1|76.3|
+| AI2D                         | 88.2             | 87.9                  | 86.2                  |          86.6 | 86.2                  | 86.3        | **88.3**    |
 | OCRBench                     | 88.1             | 89.4                  | **90.5**              |          85.6 | 74.1                  | 87.9        | 89.4        |
 | MMVet                        | 76.7             | 72.6                  | 68.1                  |          68   | 60.6                  | 68.4        | **77.1**    |
 | MMBench<sub>test</sub>       | **88.2**         | 86.4                  | 85.4                  |          84.6 | 85.6                  | 87.1        | 87.8        |
 | MMT-Bench<sub>val</sub>      | 69.1             | 69.1                  | 65.7                  |          68.2 | -                     | 69.2        | **71.2**    |
 | RealWorldQA                  | **75.9**         | 74.4                  | 73.7                  |          72.7 | 73.9                  | 74.1        | 75.6        |
 | BLINK                        | 62.3             | **63.2**              | 62.6                  |          48   | -                     | 59.0        | 60.1        |
 | QBench                       | -                | 76.1                  | 76.0                  |          77.7 | -                     | 79.5        | **79.8**    |
 | ABench                       | -                | 78.6                  | **79.4**              |          76.5 | -                     | **79.4**    | 78.7        |
 | MTVQA                        | -                | **31.2**              | 28.7                  |          26.5 | -                     | 30.3        | 30.6        |
 ### Video Benchmark
 | Benchmark           | Qwen2.5-VL-72B | InternVL2.5-38B | InternVL2.5-26B | LLaVA-OneVision-72B | Ovis2-16B | Ovis2-34B     |
 |:--------------------|:--------------:|:---------------:|:---------------:|:-------------------:|:---------:|:-------------:|
 | VideoMME(wo/w-subs) | **73.3/79.1**  | 70.7 / 73.1     | 66.9 / 69.2     | 66.2/69.5           | 70.0/74.4 | 71.2/75.6     |
 | MVBench             | 70.4           | 74.4            | **75.2**        | 59.4                | 68.6      | 70.3          |
 | MLVU(M-Avg/G-Avg)   | 74.6/-         | 75.3/-          | 72.3/-          | 68.0/-              | 77.7/4.44 | **77.8**/4.59 |
 | MMBench-Video       | **2.02**       | 1.82            | 1.86            | -                   | 1.92      | 1.98          |
 | TempCompass         | 74.8           | -               | -               | -                   | 74.16     | **75.97**     |
 ### Reasoning Benchmark
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/658a8a837959448ef5500ce5/aGc1DZLPEeERd0jGW9JRA.png)
 ## Usage
 Below is a code snippet demonstrating how to run Ovis with various input types. For additional usage instructions, including inference wrapper and Gradio UI, please refer to [Ovis GitHub](https://github.com/AIDC-AI/Ovis?tab=readme-ov-file#inference).