--- library_name: transformers license: apache-2.0 language: - et - en base_model: - swiss-ai/Apertus-8B-2509 datasets: - HuggingFaceTB/smollm-corpus - HuggingFaceTB/finemath - instruction-pretrain/general-instruction-augmented-corpora pipeline_tag: text-generation --- ![image/png](assets/logo-sinine.png) # Apertus EstLLM 8B 1125 Base > Please note that this is a base text completion model that has not been instruction-tuned. It is intended for fine-tuning on downstream tasks rather than direct use for chat or instruction-following. The original [swiss-ai/Apertus-8B-2509](https://huggingface.co/swiss-ai/Apertus-8B-2509) underwent continuous pre-training starting on approximately 35B tokens. Continued pre-training was performed for a single epoch on: - Estonian National Corpus (8.6B tokens) - Python-Edu (3.3B tokens) - FineMath4-Plus (9.5B tokens) - General Instruction-Augmented Corpora (7.4B tokens) - Cosmopedia v2 (6.9B tokens) ## Model Details ### Model Description - **Developed by:** [TartuNLP](https://huggingface.co/tartuNLP) and [TalTechNLP](https://huggingface.co/TalTechNLP) research groups - **Funded by:** Estonian Ministry of Education and Research, “Estonian Language Technology Program 2018-2027” - **Model type:** Causal Language Model - **Language(s) (NLP):** Estonian, English - **License:** Apache 2.0 - **Finetuned from model:** [swiss-ai/Apertus-8B-2509](https://huggingface.co/swiss-ai/Apertus-8B-2509) ## Evaluation ## Logits-based ### Estonian | Model (# parameters ↓) | belebele-et | exam-et | grammar-et | inflection-et | trivia-et | winogrande-et | xcopa-et | GlobalPIQA-et| |-------|-----------------------------------|--------------------|-----------------|---------------|---------------|---------------|-----------|----| | utter-project/EuroLLM-9B| 0.699 | _**0.618**_ | 0.663 | 0.44 | 0.371 | 0.692 | 0.712 | 0.69 | | mistralai/Ministral-3-8B-Base-2512 | 0.263 | 0.528 | 0.641 | 0.585 | 0.316 | 0.623 | 0.56 | 0.6 | | swiss-ai/Apertus-8B-2509 | 0.768 | 0.607 | 0.789 | 0.478 | 0.329 | 0.711 | 0.678 | 0.73 | | meta-llama/Llama-3.1-8B | 0.67 | 0.447 | 0.658 | _**0.587**_ | 0.3 | 0.596 | 0.532 | 0.53 | | **tartuNLP/Apertus-EstLLM-8B-1125** | **0.788** | **0.636** | _**0.834**_ | 0.523 | _**0.389**_ | **0.752** | _**0.73**_ | **0.79** | | tartuNLP/Llama-3.1-EstLLM-8B-0525 | _**0.772**_ | 0.57 | **0.875** | **0.619** | **0.449** | _**0.74**_ | **0.752** | _**0.78**_ | | Llammas-base | 0.387 | 0.462 | 0.538 | 0.269 | 0.336 | 0.697 | 0.686 | 0.76 | | BSC-LT/salamandra-7b | 0.448 | 0.505 | 0.699 | 0.268 | 0.296 | 0.673 | 0.658 | 0.71 | | Qwen/Qwen2.5-7B | 0.664 | 0.455 | 0.654 | 0.452 | 0.29 | 0.53 | 0.494 | 0.54 | ### English | Model (# parameters ↓) | belebele-en | MMLU-Redux | winogrande | |-------|-----------------------------------|---------------|----| | utter-project/EuroLLM-9B | 0.773 | 0.557 | 0.732 | | mistralai/Ministral-3-8B-Base-2512 | 0.897 | _**0.729**_ | _**0.771**_ | | swiss-ai/Apertus-8B-2509 | 0.827 | 0.598 | 0.761 | | meta-llama/Llama-3.1-8B | _**0.873**_ | 0.649 | **0.785** | | **tartuNLP/Apertus-EstLLM-8B-1125** | 0.843 | 0.625 | 0.763 | | tartuNLP/Llama-3.1-EstLLM-8B-0525 | 0.87 | 0.627 | 0.766 | | tartuNLP/Llammas-base | 0.45 | 0.35 | 0.72 | | BSC-LT/salamandra-7b | 0.531 | 0.449 | 0.706 | | Qwen/Qwen2.5-7B | **0.912** | **0.75** | 0.751 | ### Translation | Model (# parameters ↓) | flores en→et (BLEU) | flores et→en (BLEU) | |-------|-----------------------------------|---------------| | utter-project/EuroLLM-9B | **29.0** | **41.2** | | mistralai/Ministral-3-8B-Base-2512 | 12.6 | 29.6 | | swiss-ai/Apertus-8B-2509 | 25.0 | _**38.5**_ | | meta-llama/Llama-3.1-8B | 13.5 | 33.7 | | **tartuNLP/Apertus-EstLLM-8B-1125** | 27.4 | 37.4 | | tartuNLP/Llama-3.1-EstLLM-8B-0525 | _**28.1**_ | 36.8 | | tartuNLP/Llammas-base | 22.0 | 32.7 | | BSC-LT/salamandra-7b | 14.7 | 18.2 | | Qwen/Qwen2.5-7B | 5.1 | 27.5 | ## Limitations In addition to the limitations of the original Apertus 8B model, this model has the following: - Somewhat limited context size due to the continued training being done with the sequence length of 4096 tokens. ## Citation ``` @misc{dorkin2026estllmenhancingestoniancapabilities, title={{EstLLM: Enhancing Estonian Capabilities in Multilingual LLMs via Continued Pretraining and Post-Training}}, author={Aleksei Dorkin and Taido Purason and Emil Kalbaliyev and Hele-Andra Kuulmets and Marii Ojastu and Mark Fišel and Tanel Alumäe and Eleri Aedmaa and Krister Kruusmaa and Kairit Sirts}, year={2026}, eprint={2603.02041}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2603.02041}, } ```