初始化项目，由ModelHub XC社区提供模型

Model: klusai/tf3-26m-student Source: Original Platform
2026-06-16 04:40:17 +08:00
commit 8b9c636491
7 changed files with 128281 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,84 @@
+---
+license: apache-2.0
+language:
+- ro
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- llama
+- romanian
+- synthetic-data
+- distillation
+- tinyfabulist
+- fables
+base_model: klusai/tf3-50m-base
+datasets:
+- klusai/ds-tf2-en-ro-15k
+---
+
+# TF3 Student: Distilled Romanian Language Model
+
+A compact **22.9M-parameter** Romanian language model distilled from the [TF3-50M teacher](https://huggingface.co/klusai/tf3-50m-base) using logit-based knowledge distillation. Part of the [TinyFabulist](https://arxiv.org/abs/2601.10410) research project.
+
+## Model Details
+
+| Property | Value |
+|----------|-------|
+| Parameters | 22.9M (26.45M with untied embeddings) |
+| Architecture | LLaMA-style decoder-only Transformer |
+| Hidden size | 384 |
+| Attention heads | 6 (head dim 64) |
+| Layers | 6 |
+| MLP intermediate | 1,024 |
+| Vocab size | 32,000 (Unigram, Romanian-specific) |
+| Context length | 2,048 tokens |
+| Tied embeddings | Yes |
+| Training | Knowledge distillation from klusai/tf3-50m-base |
+
+## Training
+
+- **Method**: Logit-based knowledge distillation (KL + CE loss, alpha=0.009)
+- **Teacher**: [klusai/tf3-50m-base](https://huggingface.co/klusai/tf3-50m-base) (51.65M params, frozen)
+- **Data**: [klusai/ds-tf2-en-ro-15k](https://huggingface.co/datasets/klusai/ds-tf2-en-ro-15k) (15k Romanian fables)
+- **Temperature**: T=1.0
+- **Epochs**: 3
+- **Learning rate**: 3e-4 (cosine schedule, 50-step warmup)
+- **Hardware**: Apple M3 Ultra (96GB unified memory)
+
+## Intended Use
+
+This model is a research artifact demonstrating knowledge distillation for compact Romanian language models trained on synthetic moral microfiction. It is designed for:
+
+- Research on compact language model compression
+- Romanian text generation in the fable/moral story domain
+- Downstream fine-tuning for Romanian NLP tasks
+
+**Not intended for**: Production text generation, factual question answering, or safety-critical applications.
+
+## Limitations
+
+- Domain-restricted to moral microfiction (fables)
+- Trained exclusively on synthetic data
+- May exhibit repetitive patterns and simplified phrasing compared to the teacher
+- Gender agreement errors may occur in generated text
+
+## Citation
+
+```bibtex
+@article{nadas2026tf3,
+  title={TF3-RO-50M: Training Compact Romanian Language Models from Scratch on Synthetic Moral Microfiction},
+  author={Nada\c{s}, Mihai Dan and Dio\c{s}an, Laura and Tomescu, Andreea and Pi\c{s}coran, Andrei},
+  journal={arXiv preprint arXiv:2601.10410},
+  year={2026}
+}
+```
+
+## Related Models and Datasets
+
+| Artifact | Description |
+|----------|-------------|
+| [klusai/tf3-50m-base](https://huggingface.co/klusai/tf3-50m-base) | Teacher model (51.65M) |
+| [klusai/tf3-50m-sft](https://huggingface.co/klusai/tf3-50m-sft) | SFT-tuned teacher |
+| [klusai/tf3-bert](https://huggingface.co/klusai/tf3-bert) | NER model for entity coherence evaluation |
+| [klusai/ds-tf2-en-ro-3m](https://huggingface.co/datasets/klusai/ds-tf2-en-ro-3m) | 3M bilingual fable corpus |
+| [klusai/ds-tf2-en-ro-15k](https://huggingface.co/datasets/klusai/ds-tf2-en-ro-15k) | 15k curated subset for distillation/SFT |