From 954f79f56a813510dd62e064d5c9303f3cc3ca22 Mon Sep 17 00:00:00 2001 From: Odunayo Ogundepo Date: Sun, 11 Jan 2026 15:30:45 +0000 Subject: [PATCH] Update README.md --- README.md | 51 +++++++++++++++++++++++++++------------------------ 1 file changed, 27 insertions(+), 24 deletions(-) diff --git a/README.md b/README.md index 814a487..0251fd9 100644 --- a/README.md +++ b/README.md @@ -21,17 +21,16 @@ tags: - Low-Resource-Languages --- -# KarantaOCR: Efficient Document Processing for African Languages - -
-Karanta OCR Logo -
-
-

Karanta OCR

+
+Karanta OCR Logo
+# KarantaOCR: Efficient Document Processing for African Languages + ## Model Description +#### [Paper](....) + **KarantaOCR** is an open-source document OCR and processing model designed for **high-accuracy text extraction in African languages**. The model focuses on preserving language-specific characters and diacritics that are often lost, normalized, or mis-transcribed by existing OCR systems. @@ -80,25 +79,24 @@ While improved performance on African languages was our priority, KarantaOCR **m KarantaOCR is evaluated on the OLMOocr benchmark using pass-rate accuracy. Scores are reported as averages across JSONL files with 95% confidence intervals. -| Model | Avg Score ↑ | 95% CI | -| --------------- | ----------- | ------ | -| **KarantaOCR** | **74.1%** | ± 1.1 | -| RoLMOCR | 74.4% | ± 1.0 | -| NanoNetsOCR-2 | 68.8% | ± 1.1 | -| OLMOCR | 65.8% | ± 0.9 | +### Results -- KarantaOCR-Bench -### Results by Documet Type (%) +... + +### Results -- OlmoOCR-Bench + +| JSONL File | KarantaOCR (3B) | RoLMOCR (7B) | NanoNetsOCR-2 (3B) | OLMOCR-1 (7B) | OLMOCR-2 (7B) | Mistral OCR API | DeepSeek-OCR (3B) | +| --------------- | ---------- | -------- | ------------- | -------- | -------- |-------- |-------- | +| arxiv_math | 74.2 | **76.8** | 73.7 | 63.3 | 83.0 | 77.2 | 77.2 | +| baseline | **99.4** | 97.9 | **99.5** | 97.9 | 99.7 | 99.4 | 99.8 | +| headers_footers | **95.3** | 94.1 | 32.8 | 93.4 | 96.1 | 93.6 | 96.1 | +| long_tiny_text | 72.2 | 61.3 | **92.1** | 54.8 | 81.9 | 77.1 | 79.4 | +| multi_column | 75.6 | 70.0 | **82.5** | 67.6 | 83.7 | 71.3 | 66.4 | +| old_scans | 41.3 | 42.4 | 41.4 | 38.6 | 47.7 | 29.3 | 33.3 | +| old_scans_math | 70.3 | **80.1** | 44.1 | 67.5 | 82.3 | 67.5 | 73.6 | +| table_tests | 64.3 | 72.2 | **84.2** | 62.3 | 84.9 | 60.6 | 80.2 | +| Average | 74.1% | 74.4% | 68.8% | 68.2% | 82.4% | 72.0 | 75.7 | -| JSONL File | KarantaOCR | RoLMOCR | NanoNetsOCR-2 | OLMOCR | -| --------------- | ---------- | -------- | ------------- | -------- | -| arxiv_math | 74.2 | **76.8** | 73.7 | 68.9 | -| baseline | **99.4** | 97.9 | **99.5** | 85.0 | -| headers_footers | **95.3** | 94.1 | 32.8 | **96.4** | -| long_tiny_text | 72.2 | 61.3 | **92.1** | 81.9 | -| multi_column | 75.6 | 70.0 | **82.5** | **84.0** | -| old_scans | 41.3 | 42.4 | 41.4 | **42.0** | -| old_scans_math | 70.3 | **80.1** | 44.1 | 0.0 | -| table_tests | 64.3 | 72.2 | **84.2** | 68.3 | ## How to Use @@ -235,3 +233,8 @@ messages = build_message( output_text = run_inference(model, processor, messages) print(output_text) ``` + + +## Citation Information + +Coming soon ...