初始化项目,由ModelHub XC社区提供模型

Model: fpadovani/tur_indomain_prepretraining_seed577
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-30 04:36:21 +08:00
commit 7dbe71f216
97 changed files with 2033223 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

58
README.md Normal file
View File

@@ -0,0 +1,58 @@
---
base_model: goldfish-models/tur_latn_10mb
library_name: transformers
model_name: tur_indomain_prepretraining_seed577
tags:
- generated_from_trainer
- trl
- sft
licence: license
---
# Model Card for tur_indomain_prepretraining_seed577
This model is a fine-tuned version of [goldfish-models/tur_latn_10mb](https://huggingface.co/goldfish-models/tur_latn_10mb).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="fpadovani/tur_indomain_prepretraining_seed577", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/f-padovani-university-of-groningen/formal_lang_tur/runs/922zberz)
This model was trained with SFT.
### Framework versions
- TRL: 0.13.0
- Transformers: 4.47.0
- Pytorch: 2.11.0
- Datasets: 4.8.5
- Tokenizers: 0.21.0
## Citations
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c5923b2d17656474f19ef5a1d6c25383f18b2dd808009ca7f4711d8102c894c1
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5fd73ce72c06ed19c6767fb499e8c31a57b3bdf02e66595f7367a83170d67c74
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0fba20601d61381b671ee6c6081307e86b1a61cc2f15ed99382a14cd90a48b49
size 14645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:49b5d602a36644e60adc641ff93b54a00173a84964edb693fdb6600fe5162714
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-1000/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5d844e9bd7269e969751057307f9b24ce1dc61943d4ef008f2ddc415e2d7a0e8
size 6161

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c4f437339781fd82fefd618a189fce8b5dd7c3f4e5f1d3b9fdf25abf380bd23b
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7a8a519416618177c392cb29c4bba2e7ed2f6caa1debd3e3195b235664d3975f
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7cff9490d2bc863db8f509a99d4cad43d5efff951d5f92d3136b57f897413c77
size 14645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cdf706bbb7bb6e14789b8a64f9da210a0ebd06388777b0fcde0c23247b8fa573
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-1500/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5d844e9bd7269e969751057307f9b24ce1dc61943d4ef008f2ddc415e2d7a0e8
size 6161

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7b5e88eecf605f59c11f8b238771a528e635a114f754c83e6225ed2fa42abaa0
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3bcb0c3af5ee28ab6daf00332e30be274bb245b694d64cdf105598d895fcfb4f
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:25d81c60f48536384b9b7228e1868081f39424d3acccce2c3b5e006364dffed7
size 14645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:012c90d7670b2bb998827d13327927d9a7c2889b9259d1469026aac83c30468b
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-2000/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5d844e9bd7269e969751057307f9b24ce1dc61943d4ef008f2ddc415e2d7a0e8
size 6161

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4deea7f3c0c749a4208cc9ed8eaa8e2a1e6bd59a31521d542f303e1bf57ba01a
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fc4c439cd9df835ef5ffe18ff16eb8ada76c6fe044cf580d7d07567eadfee516
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8912d33bcb1b75a1de00524c1a07363e0f641629fed837888da0104fda9afe1f
size 14645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b6b3ad5549d5099e5cee60aa161fa531e77bd60d13a0dc32433f3e6e01a85319
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-2500/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5d844e9bd7269e969751057307f9b24ce1dc61943d4ef008f2ddc415e2d7a0e8
size 6161

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:32192e76b9301fcb2c31e5e055701121f4f4bc1a26a360eee97739a3c083cb66
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4488ba7dee6bd28ffc349d36e588488058e96cb3e489b5ab1af97bcedef3c4f7
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:322270fa5677e7b360603288594d8936399a9ebdacb3728b1806bd3556ff3421
size 14645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2dc70e9e1c1fef46d69e152f6d8d91fd0c8ce65b2fe905d341de3ee92357d097
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-3000/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5d844e9bd7269e969751057307f9b24ce1dc61943d4ef008f2ddc415e2d7a0e8
size 6161

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:974e48024c2cdbb664ba5b04e85cf10211aed754fbf7a2a7c9221da54ab3bdc2
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bb2e1f8b5c270f6dbee41d0dbaf6e996eca40d2214d650f2a7f252817508a948
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4f6554d6aa442b703c404083a94f0101443b3e533a51e7e459cc3d3b616bf723
size 14645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f6d9a470b97058ccd4f7c214a2a15352732f3be5686d8ebcb2c85f286ab0b593
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-3500/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5d844e9bd7269e969751057307f9b24ce1dc61943d4ef008f2ddc415e2d7a0e8
size 6161

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3bdae47ec83525e5d6ecaee582605bd2fdfae59f178fbee1f48dbf3210bbaa89
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f45890c9eacccc2fb405ae15ce3cf2202b7f08130b0b382e447e96e4a8354ba2
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6ff04ff4764a22f539e251b2b8b63bf0a87ef94877d221940dd0bacce2deaf1c
size 14645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a59cc45fece372941e082c6cabb8944243288549677e1e376f4e0b1009f0e7ad
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-4000/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5d844e9bd7269e969751057307f9b24ce1dc61943d4ef008f2ddc415e2d7a0e8
size 6161

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7c9e376ce556a5336310da4308b7abbf5a49d021629da77f8eebad222b6761b1
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c67791844b3ab1e1c118a1440d1be978538e6063429cf1062c9850223bf80bb3
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:678c42a1f81ca322dd86dd40209eda932f87e51895fd1824b1d43ced2a361ea4
size 14645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2f243d020d2631ea115c3f5d3e6c1e5c9a7f8334b45c5f10056444d8c3da615c
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-500/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,733 @@
{
"best_metric": null,
"best_model_checkpoint": null,
"epoch": 0.0653893938403191,
"eval_steps": 500,
"global_step": 500,
"is_hyper_param_search": false,
"is_local_process_zero": true,
"is_world_process_zero": true,
"log_history": [
{
"epoch": 0.000653893938403191,
"grad_norm": 7.78125,
"learning_rate": 1e-05,
"loss": 10.9314,
"step": 5
},
{
"epoch": 0.001307787876806382,
"grad_norm": 7.15625,
"learning_rate": 2e-05,
"loss": 10.8618,
"step": 10
},
{
"epoch": 0.001961681815209573,
"grad_norm": 8.625,
"learning_rate": 3e-05,
"loss": 10.6602,
"step": 15
},
{
"epoch": 0.002615575753612764,
"grad_norm": 3.78125,
"learning_rate": 4e-05,
"loss": 10.4672,
"step": 20
},
{
"epoch": 0.003269469692015955,
"grad_norm": 3.21875,
"learning_rate": 5e-05,
"loss": 10.3043,
"step": 25
},
{
"epoch": 0.003923363630419146,
"grad_norm": 2.921875,
"learning_rate": 6e-05,
"loss": 10.2382,
"step": 30
},
{
"epoch": 0.004577257568822337,
"grad_norm": 2.84375,
"learning_rate": 7.000000000000001e-05,
"loss": 10.072,
"step": 35
},
{
"epoch": 0.005231151507225528,
"grad_norm": 2.84375,
"learning_rate": 8e-05,
"loss": 9.9208,
"step": 40
},
{
"epoch": 0.005885045445628719,
"grad_norm": 2.5,
"learning_rate": 8.999999999999999e-05,
"loss": 9.7828,
"step": 45
},
{
"epoch": 0.00653893938403191,
"grad_norm": 2.53125,
"learning_rate": 0.0001,
"loss": 9.5983,
"step": 50
},
{
"epoch": 0.007192833322435101,
"grad_norm": 2.515625,
"learning_rate": 0.00011,
"loss": 9.3536,
"step": 55
},
{
"epoch": 0.007846727260838291,
"grad_norm": 2.265625,
"learning_rate": 0.00012,
"loss": 9.1994,
"step": 60
},
{
"epoch": 0.008500621199241483,
"grad_norm": 1.96875,
"learning_rate": 0.00013000000000000002,
"loss": 9.0285,
"step": 65
},
{
"epoch": 0.009154515137644674,
"grad_norm": 1.734375,
"learning_rate": 0.00014000000000000001,
"loss": 8.8309,
"step": 70
},
{
"epoch": 0.009808409076047865,
"grad_norm": 1.6484375,
"learning_rate": 0.00015,
"loss": 8.7217,
"step": 75
},
{
"epoch": 0.010462303014451056,
"grad_norm": 1.296875,
"learning_rate": 0.00016,
"loss": 8.5904,
"step": 80
},
{
"epoch": 0.011116196952854247,
"grad_norm": 1.140625,
"learning_rate": 0.00017,
"loss": 8.5347,
"step": 85
},
{
"epoch": 0.011770090891257438,
"grad_norm": 1.1484375,
"learning_rate": 0.00017999999999999998,
"loss": 8.5,
"step": 90
},
{
"epoch": 0.01242398482966063,
"grad_norm": 1.265625,
"learning_rate": 0.00019,
"loss": 8.4191,
"step": 95
},
{
"epoch": 0.01307787876806382,
"grad_norm": 1.296875,
"learning_rate": 0.0002,
"loss": 8.4152,
"step": 100
},
{
"epoch": 0.013731772706467011,
"grad_norm": 1.25,
"learning_rate": 0.00021,
"loss": 8.4117,
"step": 105
},
{
"epoch": 0.014385666644870202,
"grad_norm": 1.3984375,
"learning_rate": 0.00022,
"loss": 8.3862,
"step": 110
},
{
"epoch": 0.015039560583273394,
"grad_norm": 1.6484375,
"learning_rate": 0.00023,
"loss": 8.3845,
"step": 115
},
{
"epoch": 0.015693454521676583,
"grad_norm": 1.6015625,
"learning_rate": 0.00024,
"loss": 8.3856,
"step": 120
},
{
"epoch": 0.016347348460079774,
"grad_norm": 2.140625,
"learning_rate": 0.00025,
"loss": 8.3267,
"step": 125
},
{
"epoch": 0.017001242398482965,
"grad_norm": 1.5625,
"learning_rate": 0.00026000000000000003,
"loss": 8.3117,
"step": 130
},
{
"epoch": 0.017655136336886156,
"grad_norm": 1.46875,
"learning_rate": 0.00027,
"loss": 8.2819,
"step": 135
},
{
"epoch": 0.018309030275289347,
"grad_norm": 1.890625,
"learning_rate": 0.00028000000000000003,
"loss": 8.3206,
"step": 140
},
{
"epoch": 0.01896292421369254,
"grad_norm": 1.390625,
"learning_rate": 0.00029,
"loss": 8.2799,
"step": 145
},
{
"epoch": 0.01961681815209573,
"grad_norm": 1.609375,
"learning_rate": 0.0003,
"loss": 8.2797,
"step": 150
},
{
"epoch": 0.02027071209049892,
"grad_norm": 1.65625,
"learning_rate": 0.00031,
"loss": 8.2185,
"step": 155
},
{
"epoch": 0.02092460602890211,
"grad_norm": 1.734375,
"learning_rate": 0.00032,
"loss": 8.1803,
"step": 160
},
{
"epoch": 0.021578499967305303,
"grad_norm": 2.109375,
"learning_rate": 0.00033,
"loss": 8.1965,
"step": 165
},
{
"epoch": 0.022232393905708494,
"grad_norm": 1.3984375,
"learning_rate": 0.00034,
"loss": 8.1966,
"step": 170
},
{
"epoch": 0.022886287844111685,
"grad_norm": 2.3125,
"learning_rate": 0.00035,
"loss": 8.1858,
"step": 175
},
{
"epoch": 0.023540181782514876,
"grad_norm": 1.8359375,
"learning_rate": 0.00035999999999999997,
"loss": 8.1155,
"step": 180
},
{
"epoch": 0.024194075720918067,
"grad_norm": 1.703125,
"learning_rate": 0.00037,
"loss": 8.0795,
"step": 185
},
{
"epoch": 0.02484796965932126,
"grad_norm": 1.6953125,
"learning_rate": 0.00038,
"loss": 8.0587,
"step": 190
},
{
"epoch": 0.02550186359772445,
"grad_norm": 1.4453125,
"learning_rate": 0.00039000000000000005,
"loss": 8.0238,
"step": 195
},
{
"epoch": 0.02615575753612764,
"grad_norm": 1.421875,
"learning_rate": 0.0004,
"loss": 8.0203,
"step": 200
},
{
"epoch": 0.02680965147453083,
"grad_norm": 1.65625,
"learning_rate": 0.00041,
"loss": 8.0557,
"step": 205
},
{
"epoch": 0.027463545412934023,
"grad_norm": 1.5546875,
"learning_rate": 0.00042,
"loss": 8.0153,
"step": 210
},
{
"epoch": 0.028117439351337214,
"grad_norm": 1.4453125,
"learning_rate": 0.00043,
"loss": 8.0324,
"step": 215
},
{
"epoch": 0.028771333289740405,
"grad_norm": 1.6640625,
"learning_rate": 0.00044,
"loss": 7.943,
"step": 220
},
{
"epoch": 0.029425227228143596,
"grad_norm": 1.765625,
"learning_rate": 0.00045000000000000004,
"loss": 7.9174,
"step": 225
},
{
"epoch": 0.030079121166546787,
"grad_norm": 1.484375,
"learning_rate": 0.00046,
"loss": 7.9345,
"step": 230
},
{
"epoch": 0.030733015104949978,
"grad_norm": 1.625,
"learning_rate": 0.00047,
"loss": 7.8904,
"step": 235
},
{
"epoch": 0.031386909043353166,
"grad_norm": 1.6015625,
"learning_rate": 0.00048,
"loss": 7.9578,
"step": 240
},
{
"epoch": 0.03204080298175636,
"grad_norm": 1.6328125,
"learning_rate": 0.00049,
"loss": 7.8584,
"step": 245
},
{
"epoch": 0.03269469692015955,
"grad_norm": 1.953125,
"learning_rate": 0.0005,
"loss": 7.8777,
"step": 250
},
{
"epoch": 0.03334859085856274,
"grad_norm": 1.5234375,
"learning_rate": 0.00051,
"loss": 7.8879,
"step": 255
},
{
"epoch": 0.03400248479696593,
"grad_norm": 1.5859375,
"learning_rate": 0.0005200000000000001,
"loss": 7.8749,
"step": 260
},
{
"epoch": 0.03465637873536912,
"grad_norm": 1.5078125,
"learning_rate": 0.0005300000000000001,
"loss": 7.8588,
"step": 265
},
{
"epoch": 0.03531027267377231,
"grad_norm": 1.8125,
"learning_rate": 0.00054,
"loss": 7.8164,
"step": 270
},
{
"epoch": 0.035964166612175504,
"grad_norm": 1.890625,
"learning_rate": 0.00055,
"loss": 7.8153,
"step": 275
},
{
"epoch": 0.036618060550578695,
"grad_norm": 1.828125,
"learning_rate": 0.0005600000000000001,
"loss": 7.8396,
"step": 280
},
{
"epoch": 0.037271954488981886,
"grad_norm": 1.6171875,
"learning_rate": 0.00057,
"loss": 7.8175,
"step": 285
},
{
"epoch": 0.03792584842738508,
"grad_norm": 1.609375,
"learning_rate": 0.00058,
"loss": 7.8173,
"step": 290
},
{
"epoch": 0.03857974236578827,
"grad_norm": 1.765625,
"learning_rate": 0.00059,
"loss": 7.7171,
"step": 295
},
{
"epoch": 0.03923363630419146,
"grad_norm": 1.46875,
"learning_rate": 0.0006,
"loss": 7.7434,
"step": 300
},
{
"epoch": 0.03988753024259465,
"grad_norm": 1.75,
"learning_rate": 0.00061,
"loss": 7.7722,
"step": 305
},
{
"epoch": 0.04054142418099784,
"grad_norm": 1.5859375,
"learning_rate": 0.00062,
"loss": 7.7211,
"step": 310
},
{
"epoch": 0.04119531811940103,
"grad_norm": 1.390625,
"learning_rate": 0.00063,
"loss": 7.7748,
"step": 315
},
{
"epoch": 0.04184921205780422,
"grad_norm": 1.59375,
"learning_rate": 0.00064,
"loss": 7.6554,
"step": 320
},
{
"epoch": 0.042503105996207415,
"grad_norm": 1.5,
"learning_rate": 0.0006500000000000001,
"loss": 7.6331,
"step": 325
},
{
"epoch": 0.043156999934610606,
"grad_norm": 1.484375,
"learning_rate": 0.00066,
"loss": 7.7316,
"step": 330
},
{
"epoch": 0.0438108938730138,
"grad_norm": 1.65625,
"learning_rate": 0.00067,
"loss": 7.6819,
"step": 335
},
{
"epoch": 0.04446478781141699,
"grad_norm": 1.65625,
"learning_rate": 0.00068,
"loss": 7.6488,
"step": 340
},
{
"epoch": 0.04511868174982018,
"grad_norm": 1.4921875,
"learning_rate": 0.00069,
"loss": 7.5754,
"step": 345
},
{
"epoch": 0.04577257568822337,
"grad_norm": 1.4609375,
"learning_rate": 0.0007,
"loss": 7.6011,
"step": 350
},
{
"epoch": 0.04642646962662656,
"grad_norm": 1.703125,
"learning_rate": 0.00071,
"loss": 7.5742,
"step": 355
},
{
"epoch": 0.04708036356502975,
"grad_norm": 1.5,
"learning_rate": 0.0007199999999999999,
"loss": 7.5733,
"step": 360
},
{
"epoch": 0.04773425750343294,
"grad_norm": 1.390625,
"learning_rate": 0.00073,
"loss": 7.5406,
"step": 365
},
{
"epoch": 0.048388151441836134,
"grad_norm": 1.5390625,
"learning_rate": 0.00074,
"loss": 7.5902,
"step": 370
},
{
"epoch": 0.049042045380239326,
"grad_norm": 1.4296875,
"learning_rate": 0.00075,
"loss": 7.5615,
"step": 375
},
{
"epoch": 0.04969593931864252,
"grad_norm": 1.5625,
"learning_rate": 0.00076,
"loss": 7.5224,
"step": 380
},
{
"epoch": 0.05034983325704571,
"grad_norm": 1.546875,
"learning_rate": 0.0007700000000000001,
"loss": 7.5185,
"step": 385
},
{
"epoch": 0.0510037271954489,
"grad_norm": 1.65625,
"learning_rate": 0.0007800000000000001,
"loss": 7.4919,
"step": 390
},
{
"epoch": 0.05165762113385209,
"grad_norm": 1.59375,
"learning_rate": 0.00079,
"loss": 7.4876,
"step": 395
},
{
"epoch": 0.05231151507225528,
"grad_norm": 1.578125,
"learning_rate": 0.0008,
"loss": 7.4722,
"step": 400
},
{
"epoch": 0.05296540901065847,
"grad_norm": 1.6640625,
"learning_rate": 0.0008100000000000001,
"loss": 7.498,
"step": 405
},
{
"epoch": 0.05361930294906166,
"grad_norm": 1.5390625,
"learning_rate": 0.00082,
"loss": 7.4047,
"step": 410
},
{
"epoch": 0.054273196887464854,
"grad_norm": 1.5078125,
"learning_rate": 0.00083,
"loss": 7.4376,
"step": 415
},
{
"epoch": 0.054927090825868045,
"grad_norm": 1.734375,
"learning_rate": 0.00084,
"loss": 7.4518,
"step": 420
},
{
"epoch": 0.055580984764271237,
"grad_norm": 1.453125,
"learning_rate": 0.00085,
"loss": 7.4957,
"step": 425
},
{
"epoch": 0.05623487870267443,
"grad_norm": 1.4453125,
"learning_rate": 0.00086,
"loss": 7.4782,
"step": 430
},
{
"epoch": 0.05688877264107762,
"grad_norm": 1.4140625,
"learning_rate": 0.00087,
"loss": 7.4289,
"step": 435
},
{
"epoch": 0.05754266657948081,
"grad_norm": 3.09375,
"learning_rate": 0.00088,
"loss": 7.4412,
"step": 440
},
{
"epoch": 0.058196560517884,
"grad_norm": 1.625,
"learning_rate": 0.0008900000000000001,
"loss": 7.413,
"step": 445
},
{
"epoch": 0.05885045445628719,
"grad_norm": 1.5546875,
"learning_rate": 0.0009000000000000001,
"loss": 7.4372,
"step": 450
},
{
"epoch": 0.05950434839469038,
"grad_norm": 1.5546875,
"learning_rate": 0.00091,
"loss": 7.3155,
"step": 455
},
{
"epoch": 0.060158242333093574,
"grad_norm": 1.3828125,
"learning_rate": 0.00092,
"loss": 7.4253,
"step": 460
},
{
"epoch": 0.060812136271496765,
"grad_norm": 1.4296875,
"learning_rate": 0.00093,
"loss": 7.3755,
"step": 465
},
{
"epoch": 0.061466030209899956,
"grad_norm": 1.5546875,
"learning_rate": 0.00094,
"loss": 7.391,
"step": 470
},
{
"epoch": 0.06211992414830315,
"grad_norm": 1.5703125,
"learning_rate": 0.00095,
"loss": 7.3909,
"step": 475
},
{
"epoch": 0.06277381808670633,
"grad_norm": 1.5,
"learning_rate": 0.00096,
"loss": 7.3484,
"step": 480
},
{
"epoch": 0.06342771202510952,
"grad_norm": 1.609375,
"learning_rate": 0.0009699999999999999,
"loss": 7.3424,
"step": 485
},
{
"epoch": 0.06408160596351271,
"grad_norm": 1.6953125,
"learning_rate": 0.00098,
"loss": 7.3676,
"step": 490
},
{
"epoch": 0.0647354999019159,
"grad_norm": 1.53125,
"learning_rate": 0.00099,
"loss": 7.2811,
"step": 495
},
{
"epoch": 0.0653893938403191,
"grad_norm": 1.53125,
"learning_rate": 0.001,
"loss": 7.2938,
"step": 500
}
],
"logging_steps": 5,
"max_steps": 4000,
"num_input_tokens_seen": 0,
"num_train_epochs": 1,
"save_steps": 500,
"stateful_callbacks": {
"TrainerControl": {
"args": {
"should_epoch_stop": false,
"should_evaluate": false,
"should_log": false,
"should_save": true,
"should_training_stop": false
},
"attributes": {}
}
},
"total_flos": 683343042969600.0,
"train_batch_size": 32,
"trial_name": null,
"trial_params": null
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5d844e9bd7269e969751057307f9b24ce1dc61943d4ef008f2ddc415e2d7a0e8
size 6161

35
config.json Normal file
View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3bdae47ec83525e5d6ecaee582605bd2fdfae59f178fbee1f48dbf3210bbaa89
size 79752272

1249
special_tokens_map.json Normal file

File diff suppressed because it is too large Load Diff

210940
tokenizer.json Normal file

File diff suppressed because one or more lines are too long

10829
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5d844e9bd7269e969751057307f9b24ce1dc61943d4ef008f2ddc415e2d7a0e8
size 6161