Please note that this is a base text completion model that has not been instruction-tuned. It is intended for fine-tuning on downstream tasks rather than direct use for chat or instruction-following.
The original swiss-ai/Apertus-8B-2509 underwent continuous pre-training starting on approximately 35B tokens.
Continued pre-training was performed for a single epoch on:
Estonian National Corpus (8.6B tokens)
Python-Edu (3.3B tokens)
FineMath4-Plus (9.5B tokens)
General Instruction-Augmented Corpora (7.4B tokens)
In addition to the limitations of the original Apertus 8B model, this model has the following:
Somewhat limited context size due to the continued training being done with the sequence length of 4096 tokens.
Citation
@misc{dorkin2026estllmenhancingestoniancapabilities,
title={{EstLLM: Enhancing Estonian Capabilities in Multilingual LLMs via Continued Pretraining and Post-Training}},
author={Aleksei Dorkin and Taido Purason and Emil Kalbaliyev and Hele-Andra Kuulmets and Marii Ojastu and Mark Fišel and Tanel Alumäe and Eleri Aedmaa and Krister Kruusmaa and Kairit Sirts},
year={2026},
eprint={2603.02041},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.02041},
}