Update README.md

This commit is contained in:
Odunayo Ogundepo
2026-01-25 14:37:15 +00:00
committed by system
parent 4738fd5472
commit 1517a5910a

View File

@@ -27,6 +27,26 @@ tags:
# KarantaOCR: Efficient Document Processing for African Languages
## Table of Contents
- [Model Description](#model-description)
- [Training Data](#training-data)
- [Stage 1: General OCR Training](#stage-1-general-ocr-training)
- [Stage 2: African Language Fine-Tuning](#stage-2-african-language-fine-tuning)
- [Training Plots](#training-plots)
- [Capabilities](#capabilities)
- [Evaluation](#evaluation)
- [Results -- KarantaOCR-Bench](#results----karantaocr-bench)
- [Results -- OlmoOCR-Bench](#results----olmocr-bench)
- [How to Use](#how-to-use)
- [Load the Model and Processor](#load-the-model-and-processor)
- [Prepare a PDF Page for Inference](#prepare-a-pdf-page-for-inference)
- [Run OCR Inference](#run-ocr-inference)
- [End-to-End Example](#end-to-end-example)
- [Citation Information](#citation-information)
---
## Model Description
#### [Paper](....)
@@ -59,6 +79,21 @@ KarantaOCR was trained using a **two-stage curriculum fine-tuning strategy**.
This stage emphasizes accurate transcription of **diacritics, special characters, and region-specific typography**.
### Training Plots
<div align="left">
<img src="https://cdn-uploads.huggingface.co/production/uploads/604b97e27032db3f5e8d6e8e/5Jf26jOGs12rrMwy3hwrI.png" alt="Train Loss" width="600"/>
</div>
<div align="left">
<img src="https://cdn-uploads.huggingface.co/production/uploads/604b97e27032db3f5e8d6e8e/9z4t6so8DIaykrFQHs0Su.png" alt="Eval Loss" width="600"/>
</div>
<div align="left">
<img src="https://cdn-uploads.huggingface.co/production/uploads/604b97e27032db3f5e8d6e8e/-TJltGBXNFABTkShCyvsL.png" alt="Learning Rate" width="600"/>
</div>
---
## Capabilities