297 lines
11 KiB
Markdown
297 lines
11 KiB
Markdown
|
|
---
|
|||
|
|
language:
|
|||
|
|
- he
|
|||
|
|
- en
|
|||
|
|
license: apache-2.0
|
|||
|
|
tags:
|
|||
|
|
- mistral
|
|||
|
|
- nemo
|
|||
|
|
- hebrew
|
|||
|
|
- llm
|
|||
|
|
- text-generation
|
|||
|
|
- instruction-tuned
|
|||
|
|
- chat
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
base_model: mistralai/Mistral-Nemo-Base-2407
|
|||
|
|
library_name: transformers
|
|||
|
|
widget:
|
|||
|
|
- text: "Hebrew_Nemo"
|
|||
|
|
output:
|
|||
|
|
url: https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo/resolve/main/Images/Hebrew_Nemo.png
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Hebrew_Nemo: State-of-the-Art Hebrew Language Model
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
<div align="center">
|
|||
|
|
<b style="font-size: 50px;">Hebrew_Nemo</b>
|
|||
|
|
|
|||
|
|
|
|||
|
|
</div>
|
|||
|
|
|
|||
|
|
|
|||
|
|
<div align="center">
|
|||
|
|
<b style="font-size: 80px;">12B</b>
|
|||
|
|
|
|||
|
|
|
|||
|
|
</div>
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
<div align="center" style="font-size: 18px; margin-top: 20px;">
|
|||
|
|
<b>Developed by:</b> <a href="https://huggingface.co/SicariusSicariiStuff">SicariusSicariiStuff</a>
|
|||
|
|
</div>
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**Hebrew_Nemo** is a state-of-the-art (SOTA) **Hebrew language large language model** specifically optimized for Hebrew language understanding and generation. Built upon the Mistral Nemo architecture, this model represents a significant advancement in Hebrew NLP capabilities, combining the robust multilingual foundations of Mistral Nemo with extensive Hebrew-specific fine-tuning and optimization.
|
|||
|
|
|
|||
|
|
As part of [SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff) efforts to truly democratize AI, [Hebrew_Nemo](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo) is released with a permissive **Apache 2.0** license. The model demonstrates competitive performance with **Gemma3-27B**, one of the world’s leading open-source models in multilingual capabilities—despite Gemma3-27B being **more than twice its size**. This result highlights Hebrew_Nemo’s efficiency and effectiveness, making SOTA capabilities widely available for consumers, as well as corporations.
|
|||
|
|
|
|||
|
|
Unfortunately, Gemma-3-27b-it doesn't benchmark well, but I still believe Gemma-3-27b-it is by far the best multi-lingual model:
|
|||
|
|
|
|||
|
|
| Model | Average | SNLI Acc | QA (HeQ) | Translation BLEU | Israeli Trivia | Params (B) |
|
|||
|
|
|-------|---------|----------|----------|------------------|----------------|------------|
|
|||
|
|
| google/gemma-3-27b-pt | 69.5 | 85.24 | 78.27 | 36.45 | 70.43 | 27 |
|
|||
|
|
| google/gemma-3-27b-it | 13.41 | 0 | 80.31 | 0.17 | 0 | 27 |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Benchmarks
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**Hebrew_Nemo** demonstrates SOTA performance for its size, with particularly **outstanding results in Hebrew translation**. At only **12B parameters**, it achieves a **BLEU score of 30.83**, outperforming significantly larger models such as DeepSeek-14B and AI21 Jamba-Mini (52B)— a model more than x4 times its size.
|
|||
|
|
|
|||
|
|
The model maintains **high competence across reasoning and QA**, with **SNLI accuracy of 79.76** and **HeQ score of 70.51**, indicating solid sentence-level understanding and contextual reasoning in Hebrew. Its **Israeli Trivia score (50.83)** demonstrates exceptional knowledge for its size, coming very close to a model more than 4x times its size, while vastly outperforming models of similar and even of a slightly larger size.
|
|||
|
|
|
|||
|
|
|
|||
|
|
| Model | Average | SNLI Acc | QA (HeQ) | Translation BLEU | Israeli Trivia | Params (B) |
|
|||
|
|
| ---------------------------------------- | --------: | --------: | --------: | ---------------: | -------------: | ---------: |
|
|||
|
|
| **Hebrew_Nemo** | **57.98** | 79.76 | 70.51 | **30.83** | 50.83 | 12 |
|
|||
|
|
| ai21labs/AI21-Jamba-1.5-Mini | 54.68 | 69.52 | 69.38 | 22.00 | **57.81** | 52 |
|
|||
|
|
| deepseek-ai/DeepSeek-R1-Distill-Qwen-14B | 53.19 | **85.48** | 71.38 | 22.99 | 32.89 | 14 |
|
|||
|
|
| SicariusSicariiStuff/Zion_Alpha | 53.55 | 84.05 | 67.67 | 27.93 | 34.55 | 7 |
|
|||
|
|
| Qwen/Qwen3-8B | 53.54 | 80.00 | **78.53** | 25.73 | 29.90 | 8 |
|
|||
|
|
| Mistral-Nemo-Base-2407 | 51.24 | 65.95 | 68.48 | 28.99 | 41.53 | 12.0 |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**Hebrew_Nemo** also **vastly improves** upon the original Mistral Nemo by adding massive amounts of new knowledge while refining existing capabilities:
|
|||
|
|
|
|||
|
|
| Metric | Hebrew_Nemo | Mistral-Nemo-Base | (% Improvement) |
|
|||
|
|
| :------------------- | ----------: | ----------------: | ----------------: |
|
|||
|
|
| **Average** | **57.98** | 51.24 | **+13.2%** |
|
|||
|
|
| **SNLI Accuracy** | **79.76** | 65.95 | **+20.9%** |
|
|||
|
|
| **QA (HeQ)** | **70.51** | 68.48 | **+3.0%** |
|
|||
|
|
| **Translation BLEU** | **30.83** | 28.99 | **+6.3%** |
|
|||
|
|
| **Israeli Trivia** | **50.83** | 41.53 | **+22.4%** |
|
|||
|
|
|
|||
|
|
----
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
### Technical Overview
|
|||
|
|
|
|||
|
|
- **Model Type:** Causal Language Model (Decoder-only Transformer)
|
|||
|
|
- **Base Architecture:** Mistral Nemo
|
|||
|
|
- **Language Focus:** Hebrew (עברית) with maintained multilingual capabilities
|
|||
|
|
- **License:** Apache 2.0
|
|||
|
|
- **Parameters:** 12B
|
|||
|
|
- **Context Length:** 128K tokens
|
|||
|
|
- **Layers:** 40
|
|||
|
|
- **Dim:** 5,120
|
|||
|
|
- **Head dim:** 128
|
|||
|
|
- **Hidden dim:** 14,336
|
|||
|
|
- **Activation Function:** SwiGLU
|
|||
|
|
- **Number of heads:** 32
|
|||
|
|
- **Number of kv-heads:** 8 (GQA)
|
|||
|
|
- **Vocabulary size:** 2**17 ~= 128k
|
|||
|
|
- **Rotary embeddings (theta = 1M)**
|
|||
|
|
|
|||
|
|
### Primary Use Cases
|
|||
|
|
|
|||
|
|
- **Hebrew Text Generation:** High-quality content creation in modern Hebrew
|
|||
|
|
- **Translation:** Bidirectional translation between Hebrew and other languages
|
|||
|
|
- **Question Answering:** Advanced reasoning and comprehension in Hebrew contexts
|
|||
|
|
- **Dialogue Systems:** Conversational AI applications for Hebrew speakers
|
|||
|
|
- **Text Classification:** Sentiment analysis, topic modeling, and categorization of Hebrew content
|
|||
|
|
- **Named Entity Recognition:** Extraction of entities from Hebrew text
|
|||
|
|
- **Summarization:** Concise summaries of Hebrew documents and articles
|
|||
|
|
|
|||
|
|
### Out-of-Scope Uses
|
|||
|
|
|
|||
|
|
- Real-time critical decision-making systems (medical, legal, financial) without human oversight
|
|||
|
|
- Generation of content intended to deceive or manipulate
|
|||
|
|
- Applications requiring 100% factual accuracy without verification
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Training Data and Training Methodology
|
|||
|
|
|
|||
|
|
Hebrew_Nemo was trained on a diverse corpus including:
|
|||
|
|
|
|||
|
|
| Source Type | Description | Language Coverage |
|
|||
|
|
|--------------|--------------|------------------|
|
|||
|
|
| Hebrew Wikipedia | Encyclopedia-style text | 100% Hebrew |
|
|||
|
|
| Hebrew Literature & Proverbs | Classic and modern | 100% Hebrew |
|
|||
|
|
| Hebrew-English Code-Mix | Social media & dialogue | 70% Hebrew / 30% English |
|
|||
|
|
| Synthetic Data | Instruction-following & reasoning | Mixed |
|
|||
|
|
|
|||
|
|
Data was filtered, normalized, and token-balanced to reduce bias and improve generalization across dialects.
|
|||
|
|
|
|||
|
|
Additional data trained:
|
|||
|
|
|
|||
|
|
- Modern Hebrew web text and news articles
|
|||
|
|
- Hebrew literature and academic publications
|
|||
|
|
- Biblical and Rabbinic Hebrew texts for cultural depth
|
|||
|
|
- Hebrew social media and conversational data
|
|||
|
|
- Technical documentation in Hebrew
|
|||
|
|
- Parallel corpora for translation capabilities
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**The training process involved:**
|
|||
|
|
|
|||
|
|
1. Continued pre-training on Hebrew-rich datasets
|
|||
|
|
2. Instruction fine-tuning on Hebrew task-specific data
|
|||
|
|
3. Alignment through RLHF/DPO for Hebrew linguistic preferences
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 Key Features
|
|||
|
|
|
|||
|
|
- **Native Hebrew Understanding:** Trained on millions of high-quality Hebrew documents spanning literature, news, Wikipedia, academic, and colloquial domains.
|
|||
|
|
- **Contextual Mastery:** Handles complex anaphora, idiomatic expressions, and mixed Hebrew-English text with high fidelity.
|
|||
|
|
- **Instruction-Tuned:** Aligned for chat, Q&A, summarization, and reasoning use cases.
|
|||
|
|
- **Cultural Awareness:** Sensitive to Hebrew cultural, religious, and social nuances.
|
|||
|
|
- **Optimized Inference:** Enhanced performance with Mistral’s memory-efficient attention and dynamic context window.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Out of scope usage
|
|||
|
|
* Generating disinformation or biased political content
|
|||
|
|
* Automated decision-making without human oversight
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ⚙️ Limitations
|
|||
|
|
|
|||
|
|
* May reflect **training corpus biases** (e.g., urban dialect prevalence, widespread opinions in Israeli social media)
|
|||
|
|
* Limited performance on **rare biblical or archaic Hebrew**
|
|||
|
|
* Occasionally mixes Hebrew and English when the context is ambiguous
|
|||
|
|
* Does not include alignment for safety moderation out of the box
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Model instruction template: ChatML
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
<|im_start|>system
|
|||
|
|
You answer the questions in Hebrew.<|im_end|>
|
|||
|
|
<|im_start|>User
|
|||
|
|
{prompt}<|im_end|>
|
|||
|
|
<|im_start|>AI answer
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🗣️ Example Usage
|
|||
|
|
|
|||
|
|
### Basic Inference
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|||
|
|
|
|||
|
|
model_name = "SicariusSicariiStuff/Hebrew_Nemo"
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
model_name,
|
|||
|
|
torch_dtype="auto",
|
|||
|
|
device_map="auto"
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
prompt = "מהי בינה מלאכותית?"
|
|||
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
|||
|
|
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
|
|||
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Chat Format
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
messages = [
|
|||
|
|
{"role": "user", "content": "ספר לי על ההיסטוריה של ירושלים"}
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
formatted_prompt = tokenizer.apply_chat_template(
|
|||
|
|
messages,
|
|||
|
|
tokenize=False,
|
|||
|
|
add_generation_prompt=True
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
|
|||
|
|
outputs = model.generate(**inputs, max_new_tokens=512)
|
|||
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Quantization (for lower VRAM)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import BitsAndBytesConfig
|
|||
|
|
|
|||
|
|
quantization_config = BitsAndBytesConfig(
|
|||
|
|
load_in_4bit=True,
|
|||
|
|
bnb_4bit_compute_dtype=torch.bfloat16
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
model_name,
|
|||
|
|
quantization_config=quantization_config,
|
|||
|
|
device_map="auto"
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Available quantizations:
|
|||
|
|
|
|||
|
|
- Original: [FP16](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo)
|
|||
|
|
- GGUF: [Static Quants](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_GGUF)
|
|||
|
|
- Specialized: [FP8](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_FP8)
|
|||
|
|
- Mobile (ARM): [Q4_0](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_ARM)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Citation
|
|||
|
|
|
|||
|
|
```bibtex
|
|||
|
|
@misc{hebrew_nemo_2025,
|
|||
|
|
author = {SicariusSicariiStuff},
|
|||
|
|
title = {Hebrew_Nemo: State-of-the-Art Hebrew Language Model},
|
|||
|
|
year = {2025},
|
|||
|
|
publisher = {Hugging Face},
|
|||
|
|
url = {https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
|
|||
|
|
## 🧰 Acknowledgements
|
|||
|
|
|
|||
|
|
* [Mistral](https://mistral.ai/) for the base architecture
|
|||
|
|
* [NVIDIA NeMo](https://developer.nvidia.com/nemo) framework inspiration
|
|||
|
|
* Employee#11 for her unwavering support
|
|||
|
|
|
|||
|
|
## Contact
|
|||
|
|
|
|||
|
|
For questions, issues, or collaboration opportunities:
|
|||
|
|
- **HuggingFace:** [@SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff)
|
|||
|
|
- **Issues:** Report technical issues on the model repository
|
|||
|
|
|
|||
|
|
|
|||
|
|
### Model Card Authors
|
|||
|
|
- [@SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff)
|