Update README.md

2026-01-11 15:30:45 +00:00
parent d2ae5f1f18
commit 954f79f56a
1 changed files with 27 additions and 24 deletions
--- a/README.md
+++ b/README.md
@@ -21,17 +21,16 @@ tags:
 - Low-Resource-Languages
 ---
-# KarantaOCR: Efficient Document Processing for African Languages
+<div align="left">
-
+<img src="https://cdn-uploads.huggingface.co/production/uploads/604b97e27032db3f5e8d6e8e/0IwaKLOehSDEFF3zYY_Wp.png" alt="Karanta OCR Logo" width="100"/>
 <div align="center">
 <img src="https://cdn-uploads.huggingface.co/production/uploads/604b97e27032db3f5e8d6e8e/0IwaKLOehSDEFF3zYY_Wp.png" alt="Karanta OCR Logo" width="300"/>
 <br/>
  <br>
  <h1>Karanta OCR</h1>
 </div>
 # KarantaOCR: Efficient Document Processing for African Languages
 ## Model Description
 #### [Paper](....)
 **KarantaOCR** is an open-source document OCR and processing model designed for **high-accuracy text extraction in African languages**.
 The model focuses on preserving language-specific characters and diacritics that are often lost, normalized, or mis-transcribed by existing OCR systems.
@@ -80,25 +79,24 @@ While improved performance on African languages was our priority, KarantaOCR **m
 KarantaOCR is evaluated on the OLMOocr benchmark using pass-rate accuracy. Scores are reported as averages across JSONL files with 95% confidence intervals.
-| Model           | Avg Score ↑ | 95% CI |
+### Results -- KarantaOCR-Bench
 | --------------- | ----------- | ------ |
 | **KarantaOCR**  | **74.1%**   | ± 1.1  |
 | RoLMOCR         | 74.4%       | ± 1.0  |
 | NanoNetsOCR-2   | 68.8%       | ± 1.1  |
 | OLMOCR  | 65.8%       | ± 0.9  |
-### Results by Documet Type (%)
+...
 ### Results -- OlmoOCR-Bench
 | JSONL File      | KarantaOCR (3B) | RoLMOCR (7B) | NanoNetsOCR-2 (3B)  | OLMOCR-1 (7B)  |  OLMOCR-2 (7B)  | Mistral OCR API | DeepSeek-OCR (3B) |
 | --------------- | ---------- | -------- | ------------- | -------- | -------- |-------- |-------- |
 | arxiv_math      | 74.2       | **76.8** | 73.7          | 63.3     | 83.0 | 77.2 | 77.2 |
 | baseline        | **99.4**   | 97.9     | **99.5**      | 97.9     | 99.7 | 99.4 | 99.8 |
 | headers_footers | **95.3**   | 94.1     | 32.8          | 93.4     | 96.1 | 93.6 | 96.1 |
 | long_tiny_text  | 72.2       | 61.3     | **92.1**      | 54.8     | 81.9 | 77.1 | 79.4 |
 | multi_column    | 75.6       | 70.0     | **82.5**      | 67.6     | 83.7 | 71.3 | 66.4 |
 | old_scans       | 41.3       | 42.4     | 41.4          | 38.6     | 47.7 | 29.3 | 33.3 |
 | old_scans_math  | 70.3       | **80.1** | 44.1          | 67.5     | 82.3 | 67.5 | 73.6 |
 | table_tests     | 64.3       | 72.2     | **84.2**      | 62.3     | 84.9 | 60.6 | 80.2 |
 | Average         | 74.1%      | 74.4%    | 68.8%         | 68.2%    | 82.4% | 72.0 | 75.7 |
 | JSONL File      | KarantaOCR | RoLMOCR  | NanoNetsOCR-2 | OLMOCR   |
 | --------------- | ---------- | -------- | ------------- | -------- |
 | arxiv_math      | 74.2       | **76.8** | 73.7          | 68.9     |
 | baseline        | **99.4**   | 97.9     | **99.5**      | 85.0     |
 | headers_footers | **95.3**   | 94.1     | 32.8          | **96.4** |
 | long_tiny_text  | 72.2       | 61.3     | **92.1**      | 81.9     |
 | multi_column    | 75.6       | 70.0     | **82.5**      | **84.0** |
 | old_scans       | 41.3       | 42.4     | 41.4          | **42.0** |
 | old_scans_math  | 70.3       | **80.1** | 44.1          | 0.0      |
 | table_tests     | 64.3       | 72.2     | **84.2**      | 68.3     |
 ## How to Use
@@ -235,3 +233,8 @@ messages = build_message(
 output_text = run_inference(model, processor, messages)
 print(output_text)
 ```
 ## Citation Information
 Coming soon ...