Teuta/README.md

---
license: apache-2.0
datasets:
- LTS-VVE/Teuta-sq
- LTS-VVE/grammar_sq_0.1
- LTS-VVE/linguistic_sq
- LTS-VVE/Math-physics-dataset-sq
- LTS-VVE/albanian-synthetic
- noxneural/lilium_albanicum_eng_alb
- MIND-Lab/Safety-Evaluation
- shb777/simple-math-steps-7M
- RishiKompelli/TherapyDataset
- microsoft/orca-math-word-problems-200k
- Vezora/Tested-143k-Python-Alpaca
- AI4Chem/ChemPref-DPO-for-Chemistry-data-en
- jkhedri/psychology-dataset
- samhog/psychology-10k
- Amod/mental_health_counseling_conversations
- sayhan/strix-philosophy-qa
- Maverfrick/Rust_dataset
- Neloy262/rust_instruction_dataset
- Tesslate/Rust_Dataset
language:
- en
- sq
base_model:
- meta-llama/Llama-3.2-3B
pipeline_tag: text-generation
tags:
- al
- math
- philosophy
- chemistry
- code
- biology
- climate
- not-for-all-audiences
---

<p align="center">
  <span style="color:yellow">This model is not suitable for all audiences and may contain inappropriate or explicit content.</span>
</p>

<p align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/67b7476deb48853c39ca000b/CzUTg97aTxK283qwD6kEm.png" alt="Teuta Logo" />
</p>

# Teuta (A work in progress!)

Teuta is a bilingual instruction-tuned language model designed for question answering in both Albanian (sq) and English (en). It is fine-tuned on a diverse mix of datasets covering subjects such as mathematics, philosophy, chemistry, biology, code (especially Rust), psychology, and climate science.

## Model

- **Base model**: meta-llama/Llama-3.2-3B
- **Languages**: Albanian, English
- **Primary task**: Instruction-following and question answering

## Description

Teuta is built to handle a variety of instructional prompts, from academic and scientific queries to more open-ended tasks. It is particularly suited for multilingual applications and under-resourced language support, with a strong focus on Albanian.

The model leverages both synthetic and real datasets to improve generalization across technical and non-technical domains.

## Considerations

- Some datasets include sensitive content (e.g., mental health, therapy, and philosophical questions).
- Outputs are not guaranteed to be factual or safe; use in sensitive contexts should be done with care.
- Best suited for research, educational tools, and domain-specific applications.
初始化项目，由ModelHub XC社区提供模型 Model: LTS-VVE/Teuta Source: Original Platform 2026-05-12 14:12:21 +08:00			`---`
			`license: apache-2.0`
			`datasets:`
			`- LTS-VVE/Teuta-sq`
			`- LTS-VVE/grammar_sq_0.1`
			`- LTS-VVE/linguistic_sq`
			`- LTS-VVE/Math-physics-dataset-sq`
			`- LTS-VVE/albanian-synthetic`
			`- noxneural/lilium_albanicum_eng_alb`
			`- MIND-Lab/Safety-Evaluation`
			`- shb777/simple-math-steps-7M`
			`- RishiKompelli/TherapyDataset`
			`- microsoft/orca-math-word-problems-200k`
			`- Vezora/Tested-143k-Python-Alpaca`
			`- AI4Chem/ChemPref-DPO-for-Chemistry-data-en`
			`- jkhedri/psychology-dataset`
			`- samhog/psychology-10k`
			`- Amod/mental_health_counseling_conversations`
			`- sayhan/strix-philosophy-qa`
			`- Maverfrick/Rust_dataset`
			`- Neloy262/rust_instruction_dataset`
			`- Tesslate/Rust_Dataset`
			`language:`
			`- en`
			`- sq`
			`base_model:`
			`- meta-llama/Llama-3.2-3B`
			`pipeline_tag: text-generation`
			`tags:`
			`- al`
			`- math`
			`- philosophy`
			`- chemistry`
			`- code`
			`- biology`
			`- climate`
			`- not-for-all-audiences`
			`---`

			`<p align="center">`
			`<span style="color:yellow">This model is not suitable for all audiences and may contain inappropriate or explicit content.</span>`
			`</p>`

			`<p align="center">`
			`<img src="https://cdn-uploads.huggingface.co/production/uploads/67b7476deb48853c39ca000b/CzUTg97aTxK283qwD6kEm.png" alt="Teuta Logo" />`
			`</p>`

			`# Teuta (A work in progress!)`

			`Teuta is a bilingual instruction-tuned language model designed for question answering in both Albanian (sq) and English (en). It is fine-tuned on a diverse mix of datasets covering subjects such as mathematics, philosophy, chemistry, biology, code (especially Rust), psychology, and climate science.`

			`## Model`

			`- Base model: meta-llama/Llama-3.2-3B`
			`- Languages: Albanian, English`
			`- Primary task: Instruction-following and question answering`

			`## Description`

			`Teuta is built to handle a variety of instructional prompts, from academic and scientific queries to more open-ended tasks. It is particularly suited for multilingual applications and under-resourced language support, with a strong focus on Albanian.`

			`The model leverages both synthetic and real datasets to improve generalization across technical and non-technical domains.`

			`## Considerations`

			`- Some datasets include sensitive content (e.g., mental health, therapy, and philosophical questions).`
			`- Outputs are not guaranteed to be factual or safe; use in sensitive contexts should be done with care.`
			`- Best suited for research, educational tools, and domain-specific applications.`