68 lines
2.3 KiB
Markdown
68 lines
2.3 KiB
Markdown
|
|
---
|
||
|
|
license: apache-2.0
|
||
|
|
datasets:
|
||
|
|
- LTS-VVE/Teuta-sq
|
||
|
|
- LTS-VVE/grammar_sq_0.1
|
||
|
|
- LTS-VVE/linguistic_sq
|
||
|
|
- LTS-VVE/Math-physics-dataset-sq
|
||
|
|
- LTS-VVE/albanian-synthetic
|
||
|
|
- noxneural/lilium_albanicum_eng_alb
|
||
|
|
- MIND-Lab/Safety-Evaluation
|
||
|
|
- shb777/simple-math-steps-7M
|
||
|
|
- RishiKompelli/TherapyDataset
|
||
|
|
- microsoft/orca-math-word-problems-200k
|
||
|
|
- Vezora/Tested-143k-Python-Alpaca
|
||
|
|
- AI4Chem/ChemPref-DPO-for-Chemistry-data-en
|
||
|
|
- jkhedri/psychology-dataset
|
||
|
|
- samhog/psychology-10k
|
||
|
|
- Amod/mental_health_counseling_conversations
|
||
|
|
- sayhan/strix-philosophy-qa
|
||
|
|
- Maverfrick/Rust_dataset
|
||
|
|
- Neloy262/rust_instruction_dataset
|
||
|
|
- Tesslate/Rust_Dataset
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
- sq
|
||
|
|
base_model:
|
||
|
|
- meta-llama/Llama-3.2-3B
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
tags:
|
||
|
|
- al
|
||
|
|
- math
|
||
|
|
- philosophy
|
||
|
|
- chemistry
|
||
|
|
- code
|
||
|
|
- biology
|
||
|
|
- climate
|
||
|
|
- not-for-all-audiences
|
||
|
|
---
|
||
|
|
|
||
|
|
<p align="center">
|
||
|
|
<span style="color:yellow">This model is not suitable for all audiences and may contain inappropriate or explicit content.</span>
|
||
|
|
</p>
|
||
|
|
|
||
|
|
<p align="center">
|
||
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/67b7476deb48853c39ca000b/CzUTg97aTxK283qwD6kEm.png" alt="Teuta Logo" />
|
||
|
|
</p>
|
||
|
|
|
||
|
|
# Teuta (A work in progress!)
|
||
|
|
|
||
|
|
Teuta is a bilingual instruction-tuned language model designed for question answering in both Albanian (sq) and English (en). It is fine-tuned on a diverse mix of datasets covering subjects such as mathematics, philosophy, chemistry, biology, code (especially Rust), psychology, and climate science.
|
||
|
|
|
||
|
|
## Model
|
||
|
|
|
||
|
|
- **Base model**: meta-llama/Llama-3.2-3B
|
||
|
|
- **Languages**: Albanian, English
|
||
|
|
- **Primary task**: Instruction-following and question answering
|
||
|
|
|
||
|
|
## Description
|
||
|
|
|
||
|
|
Teuta is built to handle a variety of instructional prompts, from academic and scientific queries to more open-ended tasks. It is particularly suited for multilingual applications and under-resourced language support, with a strong focus on Albanian.
|
||
|
|
|
||
|
|
The model leverages both synthetic and real datasets to improve generalization across technical and non-technical domains.
|
||
|
|
|
||
|
|
## Considerations
|
||
|
|
|
||
|
|
- Some datasets include sensitive content (e.g., mental health, therapy, and philosophical questions).
|
||
|
|
- Outputs are not guaranteed to be factual or safe; use in sensitive contexts should be done with care.
|
||
|
|
- Best suited for research, educational tools, and domain-specific applications.
|