Files
Apertus-EstLLM-8B-1125/README.md
ModelHub XC 53e1bf2a05 初始化项目,由ModelHub XC社区提供模型
Model: tartuNLP/Apertus-EstLLM-8B-1125
Source: Original Platform
2026-04-10 20:31:59 +08:00

108 lines
4.6 KiB
Markdown

---
library_name: transformers
license: apache-2.0
language:
- et
- en
base_model:
- swiss-ai/Apertus-8B-2509
datasets:
- HuggingFaceTB/smollm-corpus
- HuggingFaceTB/finemath
- instruction-pretrain/general-instruction-augmented-corpora
pipeline_tag: text-generation
---
![image/png](assets/logo-sinine.png)
# Apertus EstLLM 8B 1125 Base
> Please note that this is a base text completion model that has not been instruction-tuned. It is intended for fine-tuning on downstream tasks rather than direct use for chat or instruction-following.
The original [swiss-ai/Apertus-8B-2509](https://huggingface.co/swiss-ai/Apertus-8B-2509) underwent continuous pre-training starting on approximately 35B tokens.
Continued pre-training was performed for a single epoch on:
- Estonian National Corpus (8.6B tokens)
- Python-Edu (3.3B tokens)
- FineMath4-Plus (9.5B tokens)
- General Instruction-Augmented Corpora (7.4B tokens)
- Cosmopedia v2 (6.9B tokens)
## Model Details
### Model Description
- **Developed by:** [TartuNLP](https://huggingface.co/tartuNLP) and [TalTechNLP](https://huggingface.co/TalTechNLP) research groups
- **Funded by:** Estonian Ministry of Education and Research, “Estonian Language Technology Program 2018-2027”
- **Model type:** Causal Language Model
- **Language(s) (NLP):** Estonian, English
- **License:** Apache 2.0
- **Finetuned from model:** [swiss-ai/Apertus-8B-2509](https://huggingface.co/swiss-ai/Apertus-8B-2509)
## Evaluation
## Logits-based
### Estonian
| Model (# parameters ↓) | belebele-et | exam-et | grammar-et | inflection-et | trivia-et | winogrande-et | xcopa-et | GlobalPIQA-et|
|-------|-----------------------------------|--------------------|-----------------|---------------|---------------|---------------|-----------|----|
| utter-project/EuroLLM-9B| 0.699 | _**0.618**_ | 0.663 | 0.44 | 0.371 | 0.692 | 0.712 | 0.69 |
| mistralai/Ministral-3-8B-Base-2512 | 0.263 | 0.528 | 0.641 | 0.585 | 0.316 | 0.623 | 0.56 | 0.6 |
| swiss-ai/Apertus-8B-2509 | 0.768 | 0.607 | 0.789 | 0.478 | 0.329 | 0.711 | 0.678 | 0.73 |
| meta-llama/Llama-3.1-8B | 0.67 | 0.447 | 0.658 | _**0.587**_ | 0.3 | 0.596 | 0.532 | 0.53 |
| **tartuNLP/Apertus-EstLLM-8B-1125** | **0.788** | **0.636** | _**0.834**_ | 0.523 | _**0.389**_ | **0.752** | _**0.73**_ | **0.79** |
| tartuNLP/Llama-3.1-EstLLM-8B-0525 | _**0.772**_ | 0.57 | **0.875** | **0.619** | **0.449** | _**0.74**_ | **0.752** | _**0.78**_ |
| Llammas-base | 0.387 | 0.462 | 0.538 | 0.269 | 0.336 | 0.697 | 0.686 | 0.76 |
| BSC-LT/salamandra-7b | 0.448 | 0.505 | 0.699 | 0.268 | 0.296 | 0.673 | 0.658 | 0.71 |
| Qwen/Qwen2.5-7B | 0.664 | 0.455 | 0.654 | 0.452 | 0.29 | 0.53 | 0.494 | 0.54 |
### English
| Model (# parameters ↓) | belebele-en | MMLU-Redux | winogrande |
|-------|-----------------------------------|---------------|----|
| utter-project/EuroLLM-9B | 0.773 | 0.557 | 0.732 |
| mistralai/Ministral-3-8B-Base-2512 | 0.897 | _**0.729**_ | _**0.771**_ |
| swiss-ai/Apertus-8B-2509 | 0.827 | 0.598 | 0.761 |
| meta-llama/Llama-3.1-8B | _**0.873**_ | 0.649 | **0.785** |
| **tartuNLP/Apertus-EstLLM-8B-1125** | 0.843 | 0.625 | 0.763 |
| tartuNLP/Llama-3.1-EstLLM-8B-0525 | 0.87 | 0.627 | 0.766 |
| tartuNLP/Llammas-base | 0.45 | 0.35 | 0.72 |
| BSC-LT/salamandra-7b | 0.531 | 0.449 | 0.706 |
| Qwen/Qwen2.5-7B | **0.912** | **0.75** | 0.751 |
### Translation
| Model (# parameters ↓) | flores en→et (BLEU) | flores et→en (BLEU) |
|-------|-----------------------------------|---------------|
| utter-project/EuroLLM-9B | **29.0** | **41.2** |
| mistralai/Ministral-3-8B-Base-2512 | 12.6 | 29.6 |
| swiss-ai/Apertus-8B-2509 | 25.0 | _**38.5**_ |
| meta-llama/Llama-3.1-8B | 13.5 | 33.7 |
| **tartuNLP/Apertus-EstLLM-8B-1125** | 27.4 | 37.4 |
| tartuNLP/Llama-3.1-EstLLM-8B-0525 | _**28.1**_ | 36.8 |
| tartuNLP/Llammas-base | 22.0 | 32.7 |
| BSC-LT/salamandra-7b | 14.7 | 18.2 |
| Qwen/Qwen2.5-7B | 5.1 | 27.5 |
## Limitations
In addition to the limitations of the original Apertus 8B model, this model has the following:
- Somewhat limited context size due to the continued training being done with the sequence length of 4096 tokens.
## Citation
```
@misc{dorkin2026estllmenhancingestoniancapabilities,
title={{EstLLM: Enhancing Estonian Capabilities in Multilingual LLMs via Continued Pretraining and Post-Training}},
author={Aleksei Dorkin and Taido Purason and Emil Kalbaliyev and Hele-Andra Kuulmets and Marii Ojastu and Mark Fišel and Tanel Alumäe and Eleri Aedmaa and Krister Kruusmaa and Kairit Sirts},
year={2026},
eprint={2603.02041},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.02041},
}
```