Update README.md

2026-01-11 15:30:45 +00:00
parent d2ae5f1f18
commit 954f79f56a
1 changed files with 27 additions and 24 deletions
--- a/README.md
+++ b/README.md
@@ -21,17 +21,16 @@ tags:
 - Low-Resource-Languages
 ---

-# KarantaOCR: Efficient Document Processing for African Languages
-
-<div align="center">
-<img src="https://cdn-uploads.huggingface.co/production/uploads/604b97e27032db3f5e8d6e8e/0IwaKLOehSDEFF3zYY_Wp.png" alt="Karanta OCR Logo" width="300"/>
-<br/>
-  <br>
-  <h1>Karanta OCR</h1>
+<div align="left">
+<img src="https://cdn-uploads.huggingface.co/production/uploads/604b97e27032db3f5e8d6e8e/0IwaKLOehSDEFF3zYY_Wp.png" alt="Karanta OCR Logo" width="100"/>
 </div>

+# KarantaOCR: Efficient Document Processing for African Languages
+
 ## Model Description

+#### [Paper](....)
+
 **KarantaOCR** is an open-source document OCR and processing model designed for **high-accuracy text extraction in African languages**.
 The model focuses on preserving language-specific characters and diacritics that are often lost, normalized, or mis-transcribed by existing OCR systems.

@@ -80,25 +79,24 @@ While improved performance on African languages was our priority, KarantaOCR **m

 KarantaOCR is evaluated on the OLMOocr benchmark using pass-rate accuracy. Scores are reported as averages across JSONL files with 95% confidence intervals.

-| Model           | Avg Score ↑ | 95% CI |
-| --------------- | ----------- | ------ |
-| **KarantaOCR**  | **74.1%**   | ± 1.1  |
-| RoLMOCR         | 74.4%       | ± 1.0  |
-| NanoNetsOCR-2   | 68.8%       | ± 1.1  |
-| OLMOCR  | 65.8%       | ± 0.9  |
+### Results -- KarantaOCR-Bench

-### Results by Documet Type (%)
+...
+
+### Results -- OlmoOCR-Bench
+
+| JSONL File      | KarantaOCR (3B) | RoLMOCR (7B) | NanoNetsOCR-2 (3B)  | OLMOCR-1 (7B)  |  OLMOCR-2 (7B)  | Mistral OCR API | DeepSeek-OCR (3B) |
+| --------------- | ---------- | -------- | ------------- | -------- | -------- |-------- |-------- |
+| arxiv_math      | 74.2       | **76.8** | 73.7          | 63.3     | 83.0 | 77.2 | 77.2 |
+| baseline        | **99.4**   | 97.9     | **99.5**      | 97.9     | 99.7 | 99.4 | 99.8 |
+| headers_footers | **95.3**   | 94.1     | 32.8          | 93.4     | 96.1 | 93.6 | 96.1 |
+| long_tiny_text  | 72.2       | 61.3     | **92.1**      | 54.8     | 81.9 | 77.1 | 79.4 |
+| multi_column    | 75.6       | 70.0     | **82.5**      | 67.6     | 83.7 | 71.3 | 66.4 |
+| old_scans       | 41.3       | 42.4     | 41.4          | 38.6     | 47.7 | 29.3 | 33.3 |
+| old_scans_math  | 70.3       | **80.1** | 44.1          | 67.5     | 82.3 | 67.5 | 73.6 |
+| table_tests     | 64.3       | 72.2     | **84.2**      | 62.3     | 84.9 | 60.6 | 80.2 |
+| Average         | 74.1%      | 74.4%    | 68.8%         | 68.2%    | 82.4% | 72.0 | 75.7 |

-| JSONL File      | KarantaOCR | RoLMOCR  | NanoNetsOCR-2 | OLMOCR   |
-| --------------- | ---------- | -------- | ------------- | -------- |
-| arxiv_math      | 74.2       | **76.8** | 73.7          | 68.9     |
-| baseline        | **99.4**   | 97.9     | **99.5**      | 85.0     |
-| headers_footers | **95.3**   | 94.1     | 32.8          | **96.4** |
-| long_tiny_text  | 72.2       | 61.3     | **92.1**      | 81.9     |
-| multi_column    | 75.6       | 70.0     | **82.5**      | **84.0** |
-| old_scans       | 41.3       | 42.4     | 41.4          | **42.0** |
-| old_scans_math  | 70.3       | **80.1** | 44.1          | 0.0      |
-| table_tests     | 64.3       | 72.2     | **84.2**      | 68.3     |

 ## How to Use

@@ -235,3 +233,8 @@ messages = build_message(
 output_text = run_inference(model, processor, messages)
 print(output_text)
 ```
+
+
+## Citation Information
+
+Coming soon ...