Files
Hebrew_Nemo/README.md
ModelHub XC 60240832b1 初始化项目,由ModelHub XC社区提供模型
Model: SicariusSicariiStuff/Hebrew_Nemo
Source: Original Platform
2026-05-30 07:19:12 +08:00

297 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
language:
- he
- en
license: apache-2.0
tags:
- mistral
- nemo
- hebrew
- llm
- text-generation
- instruction-tuned
- chat
pipeline_tag: text-generation
base_model: mistralai/Mistral-Nemo-Base-2407
library_name: transformers
widget:
- text: "Hebrew_Nemo"
output:
url: https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo/resolve/main/Images/Hebrew_Nemo.png
---
# Hebrew_Nemo: State-of-the-Art Hebrew Language Model
---
<div align="center">
<b style="font-size: 50px;">Hebrew_Nemo</b>
</div>
<div align="center">
<b style="font-size: 80px;">12B</b>
</div>
---
<div align="center" style="font-size: 18px; margin-top: 20px;">
<b>Developed by:</b> <a href="https://huggingface.co/SicariusSicariiStuff">SicariusSicariiStuff</a>
</div>
---
**Hebrew_Nemo** is a state-of-the-art (SOTA) **Hebrew language large language model** specifically optimized for Hebrew language understanding and generation. Built upon the Mistral Nemo architecture, this model represents a significant advancement in Hebrew NLP capabilities, combining the robust multilingual foundations of Mistral Nemo with extensive Hebrew-specific fine-tuning and optimization.
As part of [SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff) efforts to truly democratize AI, [Hebrew_Nemo](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo) is released with a permissive **Apache 2.0** license. The model demonstrates competitive performance with **Gemma3-27B**, one of the worlds leading open-source models in multilingual capabilities—despite Gemma3-27B being **more than twice its size**. This result highlights Hebrew_Nemos efficiency and effectiveness, making SOTA capabilities widely available for consumers, as well as corporations.
Unfortunately, Gemma-3-27b-it doesn't benchmark well, but I still believe Gemma-3-27b-it is by far the best multi-lingual model:
| Model | Average | SNLI Acc | QA (HeQ) | Translation BLEU | Israeli Trivia | Params (B) |
|-------|---------|----------|----------|------------------|----------------|------------|
| google/gemma-3-27b-pt | 69.5 | 85.24 | 78.27 | 36.45 | 70.43 | 27 |
| google/gemma-3-27b-it | 13.41 | 0 | 80.31 | 0.17 | 0 | 27 |
---
# Benchmarks
---
**Hebrew_Nemo** demonstrates SOTA performance for its size, with particularly **outstanding results in Hebrew translation**. At only **12B parameters**, it achieves a **BLEU score of 30.83**, outperforming significantly larger models such as DeepSeek-14B and AI21 Jamba-Mini (52B)— a model more than x4 times its size.
The model maintains **high competence across reasoning and QA**, with **SNLI accuracy of 79.76** and **HeQ score of 70.51**, indicating solid sentence-level understanding and contextual reasoning in Hebrew. Its **Israeli Trivia score (50.83)** demonstrates exceptional knowledge for its size, coming very close to a model more than 4x times its size, while vastly outperforming models of similar and even of a slightly larger size.
| Model | Average | SNLI Acc | QA (HeQ) | Translation BLEU | Israeli Trivia | Params (B) |
| ---------------------------------------- | --------: | --------: | --------: | ---------------: | -------------: | ---------: |
| **Hebrew_Nemo** | **57.98** | 79.76 | 70.51 | **30.83** | 50.83 | 12 |
| ai21labs/AI21-Jamba-1.5-Mini | 54.68 | 69.52 | 69.38 | 22.00 | **57.81** | 52 |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-14B | 53.19 | **85.48** | 71.38 | 22.99 | 32.89 | 14 |
| SicariusSicariiStuff/Zion_Alpha | 53.55 | 84.05 | 67.67 | 27.93 | 34.55 | 7 |
| Qwen/Qwen3-8B | 53.54 | 80.00 | **78.53** | 25.73 | 29.90 | 8 |
| Mistral-Nemo-Base-2407 | 51.24 | 65.95 | 68.48 | 28.99 | 41.53 | 12.0 |
---
**Hebrew_Nemo** also **vastly improves** upon the original Mistral Nemo by adding massive amounts of new knowledge while refining existing capabilities:
| Metric | Hebrew_Nemo | Mistral-Nemo-Base | (% Improvement) |
| :------------------- | ----------: | ----------------: | ----------------: |
| **Average** | **57.98** | 51.24 | **+13.2%** |
| **SNLI Accuracy** | **79.76** | 65.95 | **+20.9%** |
| **QA (HeQ)** | **70.51** | 68.48 | **+3.0%** |
| **Translation BLEU** | **30.83** | 28.99 | **+6.3%** |
| **Israeli Trivia** | **50.83** | 41.53 | **+22.4%** |
----
### Technical Overview
- **Model Type:** Causal Language Model (Decoder-only Transformer)
- **Base Architecture:** Mistral Nemo
- **Language Focus:** Hebrew (עברית) with maintained multilingual capabilities
- **License:** Apache 2.0
- **Parameters:** 12B
- **Context Length:** 128K tokens
- **Layers:** 40
- **Dim:** 5,120
- **Head dim:** 128
- **Hidden dim:** 14,336
- **Activation Function:** SwiGLU
- **Number of heads:** 32
- **Number of kv-heads:** 8 (GQA)
- **Vocabulary size:** 2**17 ~= 128k
- **Rotary embeddings (theta = 1M)**
### Primary Use Cases
- **Hebrew Text Generation:** High-quality content creation in modern Hebrew
- **Translation:** Bidirectional translation between Hebrew and other languages
- **Question Answering:** Advanced reasoning and comprehension in Hebrew contexts
- **Dialogue Systems:** Conversational AI applications for Hebrew speakers
- **Text Classification:** Sentiment analysis, topic modeling, and categorization of Hebrew content
- **Named Entity Recognition:** Extraction of entities from Hebrew text
- **Summarization:** Concise summaries of Hebrew documents and articles
### Out-of-Scope Uses
- Real-time critical decision-making systems (medical, legal, financial) without human oversight
- Generation of content intended to deceive or manipulate
- Applications requiring 100% factual accuracy without verification
## Training Data and Training Methodology
Hebrew_Nemo was trained on a diverse corpus including:
| Source Type | Description | Language Coverage |
|--------------|--------------|------------------|
| Hebrew Wikipedia | Encyclopedia-style text | 100% Hebrew |
| Hebrew Literature & Proverbs | Classic and modern | 100% Hebrew |
| Hebrew-English Code-Mix | Social media & dialogue | 70% Hebrew / 30% English |
| Synthetic Data | Instruction-following & reasoning | Mixed |
Data was filtered, normalized, and token-balanced to reduce bias and improve generalization across dialects.
Additional data trained:
- Modern Hebrew web text and news articles
- Hebrew literature and academic publications
- Biblical and Rabbinic Hebrew texts for cultural depth
- Hebrew social media and conversational data
- Technical documentation in Hebrew
- Parallel corpora for translation capabilities
---
**The training process involved:**
1. Continued pre-training on Hebrew-rich datasets
2. Instruction fine-tuning on Hebrew task-specific data
3. Alignment through RLHF/DPO for Hebrew linguistic preferences
---
## 🚀 Key Features
- **Native Hebrew Understanding:** Trained on millions of high-quality Hebrew documents spanning literature, news, Wikipedia, academic, and colloquial domains.
- **Contextual Mastery:** Handles complex anaphora, idiomatic expressions, and mixed Hebrew-English text with high fidelity.
- **Instruction-Tuned:** Aligned for chat, Q&A, summarization, and reasoning use cases.
- **Cultural Awareness:** Sensitive to Hebrew cultural, religious, and social nuances.
- **Optimized Inference:** Enhanced performance with Mistrals memory-efficient attention and dynamic context window.
---
# Out of scope usage
* Generating disinformation or biased political content
* Automated decision-making without human oversight
---
## ⚙️ Limitations
* May reflect **training corpus biases** (e.g., urban dialect prevalence, widespread opinions in Israeli social media)
* Limited performance on **rare biblical or archaic Hebrew**
* Occasionally mixes Hebrew and English when the context is ambiguous
* Does not include alignment for safety moderation out of the box
---
# Model instruction template: ChatML
```
<|im_start|>system
You answer the questions in Hebrew.<|im_end|>
<|im_start|>User
{prompt}<|im_end|>
<|im_start|>AI answer
```
---
## 🗣️ Example Usage
### Basic Inference
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "SicariusSicariiStuff/Hebrew_Nemo"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
prompt = "מהי בינה מלאכותית?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
### Chat Format
```python
messages = [
{"role": "user", "content": "ספר לי על ההיסטוריה של ירושלים"}
]
formatted_prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Quantization (for lower VRAM)
```python
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=quantization_config,
device_map="auto"
)
```
---
## Available quantizations:
- Original: [FP16](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo)
- GGUF: [Static Quants](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_GGUF)
- Specialized: [FP8](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_FP8)
- Mobile (ARM): [Q4_0](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_ARM)
---
## Citation
```bibtex
@misc{hebrew_nemo_2025,
author = {SicariusSicariiStuff},
title = {Hebrew_Nemo: State-of-the-Art Hebrew Language Model},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo}
}
```
## 🧰 Acknowledgements
* [Mistral](https://mistral.ai/) for the base architecture
* [NVIDIA NeMo](https://developer.nvidia.com/nemo) framework inspiration
* Employee#11 for her unwavering support
## Contact
For questions, issues, or collaboration opportunities:
- **HuggingFace:** [@SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff)
- **Issues:** Report technical issues on the model repository
### Model Card Authors
- [@SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff)