64 lines
3.1 KiB
Markdown
64 lines
3.1 KiB
Markdown
---
|
|
language:
|
|
- bua
|
|
- en
|
|
- ru
|
|
license: mit
|
|
tags:
|
|
- gpt3
|
|
- transformers
|
|
- mgpt
|
|
---
|
|
# 🌞 Buryat mGPT 1.3B
|
|
|
|
Language model for Buryat. Model has 1.3B parameters as you can guess from it's name.
|
|
|
|
Buryat belongs to Mongolic language family. It's a very distinct language with approximately 0.5 million speakers. Here are some facts about it:
|
|
|
|
1. Primarily spoken in the region around Lake Baikal in Russia.
|
|
2. It has multiple dialects, reflecting the wide spread of Buriat communities.
|
|
3. The Buryats are the largest indigenous group in Siberia.
|
|
|
|
## Technical details
|
|
|
|
It's one of the models derived from the base [mGPT-XL (1.3B)](https://huggingface.co/ai-forever/mGPT) model (see the list below) which was originally trained on the 61 languages from 25 language families using Wikipedia and C4 corpus.
|
|
|
|
We've found additional data for 23 languages most of which are considered as minor and decided to further tune the base model. **Buryat mGPT 1.3B** was trained for another 1000 steps with batch_size=4 and context window of **2048** tokens on 1 A100.
|
|
|
|
Final perplexity for this model on validation is **17.63**.
|
|
|
|
_Chart of the training loss and perplexity:_
|
|
|
|

|
|
|
|
## Other mGPT-1.3B models
|
|
|
|
- [🇦🇲 mGPT-1.3B Armenian](https://huggingface.co/ai-forever/mGPT-1.3B-armenian)
|
|
- [🇦🇿 mGPT-1.3B Azerbaijan](https://huggingface.co/ai-forever/mGPT-1.3B-azerbaijan)
|
|
- [🍯 mGPT-1.3B Bashkir](https://huggingface.co/ai-forever/mGPT-1.3B-bashkir)
|
|
- [🇧🇾 mGPT-1.3B Belorussian](https://huggingface.co/ai-forever/mGPT-1.3B-belorussian)
|
|
- [🇧🇬 mGPT-1.3B Bulgarian](https://huggingface.co/ai-forever/mGPT-1.3B-bulgarian)
|
|
- [🌳 mGPT-1.3B Chuvash](https://huggingface.co/ai-forever/mGPT-1.3B-chuvash)
|
|
- [🇬🇪 mGPT-1.3B Georgian](https://huggingface.co/ai-forever/mGPT-1.3B-georgian)
|
|
- [🌸 mGPT-1.3B Kalmyk](https://huggingface.co/ai-forever/mGPT-1.3B-kalmyk)
|
|
- [🇰🇿 mGPT-1.3B Kazakh](https://huggingface.co/ai-forever/mGPT-1.3B-kazakh)
|
|
- [🇰🇬 mGPT-1.3B Kirgiz](https://huggingface.co/ai-forever/mGPT-1.3B-kirgiz)
|
|
- [🐻 mGPT-1.3B Mari](https://huggingface.co/ai-forever/mGPT-1.3B-mari)
|
|
- [🇲🇳 mGPT-1.3B Mongol](https://huggingface.co/ai-forever/mGPT-1.3B-mongol)
|
|
- [🐆 mGPT-1.3B Ossetian](https://huggingface.co/ai-forever/mGPT-1.3B-ossetian)
|
|
- [🇮🇷 mGPT-1.3B Persian](https://huggingface.co/ai-forever/mGPT-1.3B-persian)
|
|
- [🇷🇴 mGPT-1.3B Romanian](https://huggingface.co/ai-forever/mGPT-1.3B-romanian)
|
|
- [🇹🇯 mGPT-1.3B Tajik](https://huggingface.co/ai-forever/mGPT-1.3B-tajik)
|
|
- [☕ mGPT-1.3B Tatar](https://huggingface.co/ai-forever/mGPT-1.3B-tatar)
|
|
- [🇹🇲 mGPT-1.3B Turkmen](https://huggingface.co/ai-forever/mGPT-1.3B-turkmen)
|
|
- [🐎 mGPT-1.3B Tuvan](https://huggingface.co/ai-forever/mGPT-1.3B-tuvan)
|
|
- [🇺🇦 mGPT-1.3B Ukranian](https://huggingface.co/ai-forever/mGPT-1.3B-ukranian)
|
|
- [🇺🇿 mGPT-1.3B Uzbek](https://huggingface.co/ai-forever/mGPT-1.3B-uzbek)
|
|
- [💎 mGPT-1.3B Yakut](https://huggingface.co/ai-forever/mGPT-1.3B-yakut)
|
|
|
|
## Feedback
|
|
|
|
If you'll found a bug of have additional data to train model on your language — please, give us feedback.
|
|
|
|
Model will be improved over time. Stay tuned!
|