253 lines
12 KiB
Markdown
253 lines
12 KiB
Markdown
---
|
|
library_name: transformers
|
|
language:
|
|
- et
|
|
- en
|
|
base_model:
|
|
- tartuNLP/Apertus-EstLLM-8B-Instruct-1125
|
|
- swiss-ai/Apertus-8B-Instruct-2509
|
|
tags:
|
|
- merge
|
|
license: apache-2.0
|
|
---
|
|
|
|

|
|
|
|
# Apertus EstLLM 8B 0326 Instruct
|
|
|
|
`Llama-3.1-EstLLM-8B-Instruct-0326` is obtained by applying the chat-vector merge approach
|
|
to [tartuNLP/Apertus-EstLLM-8B-Instruct-1125](https://huggingface.co/tartuNLP/Apertus-EstLLM-8B-Instruct-1125).
|
|
|
|
## Use with transformers
|
|
|
|
```python
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
import torch
|
|
|
|
model_name = "tartuNLP/Apertus-EstLLM-8B-Instruct-0326"
|
|
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
model_name,
|
|
dtype="auto",
|
|
device_map="auto"
|
|
)
|
|
|
|
# to use on apple silicon, load the following way
|
|
# model = AutoModelForCausalLM.from_pretrained(
|
|
# model_name,
|
|
# dtype=torch.float16,
|
|
# device_map="mps",
|
|
# )
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
|
|
messages = [
|
|
{"role": "user", "content": "Kas sa räägid eesti keelt?"}
|
|
]
|
|
|
|
text = tokenizer.apply_chat_template(
|
|
messages,
|
|
tokenize=False,
|
|
add_generation_prompt=True
|
|
)
|
|
|
|
model_inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
|
|
|
generated_ids = model.generate(
|
|
**model_inputs,
|
|
max_new_tokens=128,
|
|
do_sample=True,
|
|
temperature=0.4,
|
|
# specify eos token to stop at the end of the assistant response
|
|
eos_token_id=tokenizer.eos_token_id,
|
|
)
|
|
|
|
# generated_ids include the input tokens as well, so we only decode new tokens
|
|
response = tokenizer.decode(
|
|
generated_ids[0][model_inputs["input_ids"].shape[1]:],
|
|
skip_special_tokens=True,
|
|
)
|
|
|
|
print(response)
|
|
```
|
|
|
|
## Evaluation
|
|
|
|
## Logits-based
|
|
|
|
Scores for logits-based evaluation benchmarks are available on the [EuroEval](https://euroeval.com/leaderboards/Monolingual/estonian/) leaderboard.
|
|
|
|
## Generative
|
|
|
|
Every benchmark in this category is treated as a *generative* problem, and thus the evaluation is performed on the model responses obtained with 0 temperature (not logits).
|
|
The top scores are higlighted with **bold**. Second best scores are highlighted with **_italic bold_**. Rows are sorted in descending order based on the number of parameters of models (not scores).
|
|
The test set is used for evaluation of each dataset unless noted otherwise.
|
|
|
|
Note that _all models are evaluated with the same prompt template_ for comparability, meaning that the scores do not necessarily represent each model's best possible
|
|
performance. This is especially the case for `deepseek-ai/DeepSeek-V3-0324` on some of the benchmarks.
|
|
|
|
Only models of comparable size are evaluated on benchmarks in English.
|
|
|
|
### Instruction-following
|
|
|
|
#### Estonian
|
|
|
|
Instruction level strict accuracy is reported for IFEval-et.
|
|
|
|
| Model (# parameters ↓) | [IFEval-et](https://huggingface.co/datasets/tartuNLP/ifeval_et) |
|
|
|-------|-----------------------------------|
|
|
| moonshotai/Kimi-K2-Instruct | **0.7891** |
|
|
| deepseek-ai/DeepSeek-V3.2 | 0.7221 |
|
|
| deepseek-ai/DeepSeek-V3-0324 | 0.7171 |
|
|
| mistralai/Mistral-Large-3-675B-Instruct-2512 | 0.7097 |
|
|
| meta-llama/Llama-3.1-405B-Instruct | 0.7159 |
|
|
| meta-llama/Llama-3.3-70B-Instruct | **_0.7705_** |
|
|
| Qwen/Qwen2.5-72B-Instruct | 0.7407 |
|
|
| google/gemma-3-27b-it | 0.7655 |
|
|
| google/gemma-3-12b-it | 0.7556 |
|
|
| utter-project/EuroLLM-9B-Instruct-2512 | 0.5571 |
|
|
| utter-project/EuroLLM-9B-Instruct | 0.5397 |
|
|
| mistralai/Ministral-3-8B-Instruct-2512 | 0.4888 |
|
|
| **tartuNLP/Apertus-EstLLM-8B-Instruct-0326** | 0.5608 |
|
|
| tartuNLP/Apertus-EstLLM-8B-Instruct-1125 | 0.4665 |
|
|
| swiss-ai/Apertus-8B-Instruct-2509| 0.5484 |
|
|
| meta-llama/Llama-3.1-8B-Instruct | 0.3797 |
|
|
| tartuNLP/Llama-3.1-EstLLM-8B-Instruct-1125 | 0.6141 |
|
|
| tartuNLP/Llama-3.1-EstLLM-8B-Instruct-0825 | 0.5174 |
|
|
| BSC-LT/salamandra-7b-instruct | 0.5195 |
|
|
| tartuNLP/Llammas | 0.3524 |
|
|
| Qwen/Qwen2.5-7B-Instruct | 0.4988 |
|
|
| CohereLabs/tiny-aya-global | 0.6687 |
|
|
|
|
|
|
#### English
|
|
|
|
Instruction level strict accuracy is reported for IFEval-en.
|
|
|
|
|
|
| Model (# parameters ↓) | [IFEval-en](https://huggingface.co/datasets/tartuNLP/ifeval_en) |
|
|
|-------|-----------------------------------|
|
|
| utter-project/EuroLLM-9B-Instruct-2512 | 0.7564 |
|
|
| utter-project/EuroLLM-9B-Instruct | 0.7004 |
|
|
| mistralai/Ministral-3-8B-Instruct-2512 | 0.6845 |
|
|
| **tartuNLP/Apertus-EstLLM-8B-Instruct-0326** | 0.7089 |
|
|
| tartuNLP/Apertus-EstLLM-8B-Instruct-1125 | 0.6638 |
|
|
| swiss-ai/Apertus-8B-Instruct-2509 | 0.7808 |
|
|
| meta-llama/Llama-3.1-8B-Instruct | _**0.8106**_ |
|
|
| tartuNLP/Llama-3.1-EstLLM-8B-Instruct-1125 | **0.8173** |
|
|
| tartuNLP/Llama-3.1-EstLLM-8B-Instruct-0825 | 0.7527 |
|
|
| tartuNLP/Llammas | 0.4373 |
|
|
| BSC-LT/salamandra-7b-instruct | 0.3289 |
|
|
| Qwen/Qwen2.5-7B-Instruct | 0.7954 |
|
|
|
|
### Multiple Choice
|
|
|
|
All datasets except Winogrande-et are evaluated in 0-shot mode. Winogrande-et is evaluated in 3-shot mode. Exact match accuracy is reported for every dataset.
|
|
|
|
#### Estonian Language Competence
|
|
|
|
| Model (# parameters ↓) | [Grammar-et](https://huggingface.co/datasets/TalTechNLP/grammar_et)| [Inflection-et](https://huggingface.co/datasets/TalTechNLP/inflection_et)| [Word-Meanings-et](https://huggingface.co/datasets/TalTechNLP/word_meanings_et) |
|
|
|-------|------|------|--------|
|
|
| moonshotai/Kimi-K2-Instruct | **0.916** | 0.6458 | **0.9689** |
|
|
| deepseek-ai/DeepSeek-V3.2 | 0.781 | 0.6891 | 0.8134 |
|
|
| deepseek-ai/DeepSeek-V3-0324 | 0.364 | 0 | 0 |
|
|
| mistralai/Mistral-Large-3-675B-Instruct-2512 | 0.796 | _**0.8355**_ | 0.9488 |
|
|
| meta-llama/Llama-3.1-405B-Instruct | 0.818 | **0.9089** | 0.9438 |
|
|
| meta-llama/Llama-3.3-70B-Instruct | 0.797 | 0.6421 | 0.9408 |
|
|
| Qwen/Qwen2.5-72B-Instruct | 0.694 | 0.5208 | 0.9057 |
|
|
| google/gemma-3-27b-it | 0.817 | 0.5934 | 0.9529 |
|
|
| google/gemma-3-12b-it | 0.789 | 0.4227 | 0.9318 |
|
|
| utter-project/EuroLLM-9B-Instruct-2512 | 0.644 | 0.4466 | 0.9288 |
|
|
| utter-project/EuroLLM-9B-Instruct | 0.764 | 0.367 | 0.9258 |
|
|
| mistralai/Ministral-3-8B-Instruct-2512 | 0.562 | 0.4833 | 0.8395 |
|
|
| **tartuNLP/Apertus-EstLLM-8B-Instruct-0326**| 0.713 | 0.4326 | 0.9438 |
|
|
| tartuNLP/Apertus-EstLLM-8B-Instruct-1125| 0.646 | 0.421 | 0.9178 |
|
|
| swiss-ai/Apertus-8B-Instruct-2509 | 0.512 | 0.3662 | 0.9027 |
|
|
| meta-llama/Llama-3.1-8B-Instruct | 0.657 | 0.4165 | 0.8335 |
|
|
| tartuNLP/Llama-3.1-EstLLM-8B-Instruct-1125 | _**0.8310**_ | 0.5777 | _**0.9619**_ |
|
|
| tartuNLP/Llama-3.1-EstLLM-8B-Instruct-0825 | 0.692 | 0.5188 | 0.9569 |
|
|
| BSC-LT/salamandra-7b-instruct | 0.594 | 0.2668 | 0.8084 |
|
|
| Qwen/Qwen2.5-7B-Instruct | 0.598 | 0.4136 | 0.7984 |
|
|
| tartuNLP/Llammas | 0.529 | 0.2289 | 0.5326 |
|
|
| CohereLabs/tiny-aya-global | 0.563 | 0.3221 | 0.8455 |
|
|
|
|
#### Knowledge and Reasoning (Estonian)
|
|
|
|
|
|
| Model (# parameters ↓) | [Winogrande-et](https://huggingface.co/datasets/tartuNLP/winogrande_et) | [Trivia-et](https://huggingface.co/datasets/TalTechNLP/trivia_et) | [Exam-et](https://huggingface.co/datasets/TalTechNLP/exam_et) | [GlobalPIQA-et](https://huggingface.co/datasets/mrlbenchmarks/global-piqa-nonparallel/viewer/ekk_latn)| [TruthfulQA-et](https://huggingface.co/datasets/LumiOpen/opengpt-x_truthfulqax/viewer/mc_ET) |
|
|
|-------|-----------------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|-------------------------------------------|
|
|
| moonshotai/Kimi-K2-Instruct | **0.8138** | 0.4225 | **0.8414** | **0.79** | **0.7136** |
|
|
| deepseek-ai/DeepSeek-V3.2 | 0.4805 | 0.38 | 0.614 | 0.7 | 0.5863 |
|
|
| deepseek-ai/DeepSeek-V3-0324 | **_0.8042_** | 0.27 | 0.1221 | 0.04 | 0.2093 |
|
|
| mistralai/Mistral-Large-3-675B-Instruct-2512 | 0.7487 | 0.4275 | 0.7931 | _**0.73**_ | 0.6854 |
|
|
| meta-llama/Llama-3.1-405B-Instruct |0.7878 | **0.4713** | _**0.8309**_ | 0.58 | _**0.7001**_ |
|
|
| meta-llama/Llama-3.3-70B-Instruct |0.7397 | 0.3875 | 0.7652 | 0.58 | 0.6255 |
|
|
| Qwen/Qwen2.5-72B-Instruct | 0.7227 | 0.315 | 0.7162 | 0.65 | 0.6683 |
|
|
| google/gemma-3-27b-it | 0.7510 | 0.325 | 0.7751 | 0.71 | 0.5814 |
|
|
| google/gemma-3-12b-it | 0.6712 | 0.3237 | 0.7069 | 0.54 | 0.3158 |
|
|
| utter-project/EuroLLM-9B-Instruct-2512 | 0.5195 | 0.375 | 0.6097 | 0.52 | 0.399 |
|
|
| utter-project/EuroLLM-9B-Instruct | 0.5846 | 0.3738 | 0.5589 | 0.55 | 0.2889 |
|
|
| mistralai/Ministral-3-8B-Instruct-2512 | 0.5812 | 0.3125 | 0.5012 | 0.48 | 0.3525 |
|
|
| **tartuNLP/Apertus-EstLLM-8B-Instruct-0326** | 0.5976 | 0.35 | 0.6022 | 0.64 | 0.4296 |
|
|
| tartuNLP/Apertus-EstLLM-8B-Instruct-1125 | 0.5467 | 0.3575 | 0.5651 | 0.63 | 0.3696 |
|
|
| swiss-ai/Apertus-8B-Instruct-2509 | 0.5105 | 0.345 | 0.552 | 0.59 | 0.366 |
|
|
| meta-llama/Llama-3.1-8B-Instruct | 0.5399 | 0.2888 | 0.5 | 0.54 | 0.437 |
|
|
| tartuNLP/Llama-3.1-EstLLM-8B-Instruct-1125 | 0.6440 | _**0.4288**_ | 0.6332 | 0.68 | 0.3794 |
|
|
| tartuNLP/Llama-3.1-EstLLM-8B-Instruct-0825 | 0.5812 | 0.425 | 0.5093 | 0.63 | 0.3525 |
|
|
| BSC-LT/salamandra-7b-instruct | 0.2878 | 0.2875 | 0.3556 | 0.55 | 0.3011 |
|
|
| Qwen/Qwen2.5-7B-Instruct | 0.5473 | 0.2938 | 0.4913 | 0.57 | 0.4113 |
|
|
| tartuNLP/Llammas | 0.5037 | 0.2838 | 0.3649 | 0.01 | 0.2032 |
|
|
| CohereLabs/tiny-aya-global | 0.5603 | 0.31 | 0.5638 | 0.52 | 0.3782 |
|
|
|
|
#### Knowledge and Reasoning (English)
|
|
|
|
|
|
| Model (# parameters ↓) | [Winogrande](https://huggingface.co/datasets/allenai/winogrande) | [GlobalPIQA-en](https://huggingface.co/datasets/mrlbenchmarks/global-piqa-nonparallel/viewer/eng_latn) | [TruthfulQA](https://huggingface.co/datasets/truthfulqa/truthful_qa) | [MMLU-Redux](https://huggingface.co/datasets/edinburgh-dawg/mmlu-redux-2.0) | [GSM8K](https://huggingface.co/datasets/openai/gsm8k) |
|
|
|-------|-----------------------------------|-----------------------------------|-----------------------------------|-----------------------------------|-----------------------------------|
|
|
| utter-project/EuroLLM-9B-Instruct-2512 | 0.5546 | 0.58 |0.4614 | 0.6334 | 0.4139 |
|
|
| utter-project/EuroLLM-9B-Instruct | 0.5059 | 0.58 | 0.2962 | 0.5741 | 0.5944 |
|
|
| mistralai/Ministral-3-8B-Instruct-2512 | _**0.6503**_ | _**0.77**_ | 0.519 | _**0.7418**_ | 0.3927 |
|
|
| **tartuNLP/Apertus-EstLLM-8B-Instruct-0326** | 0.5699 | 0.69 | 0.4174 | 0.5946 | 0.5588 |
|
|
| tartuNLP/Apertus-EstLLM-8B-Instruct-1125 | 0.5348 | 0.56 | 0.3647 | 0.5944 | 0.5277 |
|
|
| swiss-ai/Apertus-8B-Instruct-2509 | 0.5133 | 0.73 | 0.3831 | 0.6099 | 0.5936 |
|
|
| meta-llama/Llama-3.1-8B-Instruct | 0.5625 | 0.76 | _**0.5239**_ | 0.6959 | 0.7710 |
|
|
| tartuNLP/Llama-3.1-EstLLM-8B-Instruct-1125 | 0.6118 | 0.76 | 0.3635 | 0.6606 | _**0.7726**_ |
|
|
| tartuNLP/Llama-3.1-EstLLM-8B-Instruct-0825 | 0.6084 | 0.71 | 0.366 | 0.6388 | 0.7202 |
|
|
| tartuNLP/Llammas | 0.498 | 0 | 0.1971 | 0.3417 | 0.1456 |
|
|
| BSC-LT/salamandra-7b-instruct | 0.4029 | 0.63 | 0.2717 | 0.5180 | 0.0076 |
|
|
| Qwen/Qwen2.5-7B-Instruct | **0.6627** | **0.83** | **0.5875** | **0.7555** | **0.7862** |
|
|
|
|
|
|
### Translation
|
|
|
|
#### English to Estonian
|
|
|
|
| Model | [wmt24pp](https://huggingface.co/datasets/google/wmt24pp) (BLEU ↑) |
|
|
|-------|---------|
|
|
| BSC-LT/salamandraTA-7b-instruct | 0.2713 |
|
|
| **tartuNLP/Apertus-EstLLM-8B-Instruct-0326** | 0.2676 |
|
|
| tartuNLP/Llama-3.1-EstLLM-8B-Instruct-1125 | 0.2635 |
|
|
| tartuNLP/Llama-3.1-EstLLM-8B-Instruct-0825 | 0.264 |
|
|
| tartuNLP/Apertus-EstLLM-8B-Instruct-1125| 0.2609 |
|
|
| utter-project/EuroLLM-9B-Instruct | 0.2602 |
|
|
| utter-project/EuroLLM-9B-Instruct-2512 | 0.2567 |
|
|
| swiss-ai/Apertus-8B-Instruct-2509 | 0.2372 |
|
|
| tartuNLP/Llammas | 0.1472 |
|
|
| meta-llama/Llama-3.1-8B-Instruct | 0.1406 |
|
|
| BSC-LT/salamandra-7b-instruct | 0.1201 |
|
|
| Qwen/Qwen2.5-7B-Instruct | 0.0476 |
|
|
|
|
|
|
|
|
## Citation
|
|
|
|
```
|
|
@misc{dorkin2026estllmenhancingestoniancapabilities,
|
|
title={{EstLLM: Enhancing Estonian Capabilities in Multilingual LLMs via Continued Pretraining and Post-Training}},
|
|
author={Aleksei Dorkin and Taido Purason and Emil Kalbaliyev and Hele-Andra Kuulmets and Marii Ojastu and Mark Fišel and Tanel Alumäe and Eleri Aedmaa and Krister Kruusmaa and Kairit Sirts},
|
|
year={2026},
|
|
eprint={2603.02041},
|
|
archivePrefix={arXiv},
|
|
primaryClass={cs.CL},
|
|
url={https://arxiv.org/abs/2603.02041},
|
|
} |