Update README.md

This commit is contained in:
Odunayo Ogundepo
2026-01-11 15:30:45 +00:00
committed by system
parent d2ae5f1f18
commit 954f79f56a

View File

@@ -21,17 +21,16 @@ tags:
- Low-Resource-Languages
---
# KarantaOCR: Efficient Document Processing for African Languages
<div align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/604b97e27032db3f5e8d6e8e/0IwaKLOehSDEFF3zYY_Wp.png" alt="Karanta OCR Logo" width="300"/>
<br/>
<br>
<h1>Karanta OCR</h1>
<div align="left">
<img src="https://cdn-uploads.huggingface.co/production/uploads/604b97e27032db3f5e8d6e8e/0IwaKLOehSDEFF3zYY_Wp.png" alt="Karanta OCR Logo" width="100"/>
</div>
# KarantaOCR: Efficient Document Processing for African Languages
## Model Description
#### [Paper](....)
**KarantaOCR** is an open-source document OCR and processing model designed for **high-accuracy text extraction in African languages**.
The model focuses on preserving language-specific characters and diacritics that are often lost, normalized, or mis-transcribed by existing OCR systems.
@@ -80,25 +79,24 @@ While improved performance on African languages was our priority, KarantaOCR **m
KarantaOCR is evaluated on the OLMOocr benchmark using pass-rate accuracy. Scores are reported as averages across JSONL files with 95% confidence intervals.
| Model | Avg Score ↑ | 95% CI |
| --------------- | ----------- | ------ |
| **KarantaOCR** | **74.1%** | ± 1.1 |
| RoLMOCR | 74.4% | ± 1.0 |
| NanoNetsOCR-2 | 68.8% | ± 1.1 |
| OLMOCR | 65.8% | ± 0.9 |
### Results -- KarantaOCR-Bench
### Results by Documet Type (%)
...
### Results -- OlmoOCR-Bench
| JSONL File | KarantaOCR (3B) | RoLMOCR (7B) | NanoNetsOCR-2 (3B) | OLMOCR-1 (7B) | OLMOCR-2 (7B) | Mistral OCR API | DeepSeek-OCR (3B) |
| --------------- | ---------- | -------- | ------------- | -------- | -------- |-------- |-------- |
| arxiv_math | 74.2 | **76.8** | 73.7 | 63.3 | 83.0 | 77.2 | 77.2 |
| baseline | **99.4** | 97.9 | **99.5** | 97.9 | 99.7 | 99.4 | 99.8 |
| headers_footers | **95.3** | 94.1 | 32.8 | 93.4 | 96.1 | 93.6 | 96.1 |
| long_tiny_text | 72.2 | 61.3 | **92.1** | 54.8 | 81.9 | 77.1 | 79.4 |
| multi_column | 75.6 | 70.0 | **82.5** | 67.6 | 83.7 | 71.3 | 66.4 |
| old_scans | 41.3 | 42.4 | 41.4 | 38.6 | 47.7 | 29.3 | 33.3 |
| old_scans_math | 70.3 | **80.1** | 44.1 | 67.5 | 82.3 | 67.5 | 73.6 |
| table_tests | 64.3 | 72.2 | **84.2** | 62.3 | 84.9 | 60.6 | 80.2 |
| Average | 74.1% | 74.4% | 68.8% | 68.2% | 82.4% | 72.0 | 75.7 |
| JSONL File | KarantaOCR | RoLMOCR | NanoNetsOCR-2 | OLMOCR |
| --------------- | ---------- | -------- | ------------- | -------- |
| arxiv_math | 74.2 | **76.8** | 73.7 | 68.9 |
| baseline | **99.4** | 97.9 | **99.5** | 85.0 |
| headers_footers | **95.3** | 94.1 | 32.8 | **96.4** |
| long_tiny_text | 72.2 | 61.3 | **92.1** | 81.9 |
| multi_column | 75.6 | 70.0 | **82.5** | **84.0** |
| old_scans | 41.3 | 42.4 | 41.4 | **42.0** |
| old_scans_math | 70.3 | **80.1** | 44.1 | 0.0 |
| table_tests | 64.3 | 72.2 | **84.2** | 68.3 |
## How to Use
@@ -235,3 +233,8 @@ messages = build_message(
output_text = run_inference(model, processor, messages)
print(output_text)
```
## Citation Information
Coming soon ...