A compact 22.9M-parameter Romanian language model distilled from the TF3-50M teacher using logit-based knowledge distillation. Part of the TinyFabulist research project.
Model Details
Property
Value
Parameters
22.9M (26.45M with untied embeddings)
Architecture
LLaMA-style decoder-only Transformer
Hidden size
384
Attention heads
6 (head dim 64)
Layers
6
MLP intermediate
1,024
Vocab size
32,000 (Unigram, Romanian-specific)
Context length
2,048 tokens
Tied embeddings
Yes
Training
Knowledge distillation from klusai/tf3-50m-base
Training
Method: Logit-based knowledge distillation (KL + CE loss, alpha=0.009)
This model is a research artifact demonstrating knowledge distillation for compact Romanian language models trained on synthetic moral microfiction. It is designed for:
Research on compact language model compression
Romanian text generation in the fable/moral story domain
Downstream fine-tuning for Romanian NLP tasks
Not intended for: Production text generation, factual question answering, or safety-critical applications.
Limitations
Domain-restricted to moral microfiction (fables)
Trained exclusively on synthetic data
May exhibit repetitive patterns and simplified phrasing compared to the teacher
Gender agreement errors may occur in generated text
Citation
@article{nadas2026tf3,title={TF3-RO-50M: Training Compact Romanian Language Models from Scratch on Synthetic Moral Microfiction},author={Nada\c{s}, Mihai Dan and Dio\c{s}an, Laura and Tomescu, Andreea and Pi\c{s}coran, Andrei},journal={arXiv preprint arXiv:2601.10410},year={2026}}