初始化项目,由ModelHub XC社区提供模型

Model: fpadovani/tur_indomain_prepretraining_seed21
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-23 01:25:36 +08:00
commit d181db07e8
97 changed files with 2033223 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

58
README.md Normal file
View File

@@ -0,0 +1,58 @@
---
base_model: goldfish-models/tur_latn_10mb
library_name: transformers
model_name: tur_indomain_prepretraining_seed21
tags:
- generated_from_trainer
- trl
- sft
licence: license
---
# Model Card for tur_indomain_prepretraining_seed21
This model is a fine-tuned version of [goldfish-models/tur_latn_10mb](https://huggingface.co/goldfish-models/tur_latn_10mb).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="fpadovani/tur_indomain_prepretraining_seed21", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/f-padovani-university-of-groningen/formal_lang_tur/runs/it4axese)
This model was trained with SFT.
### Framework versions
- TRL: 0.13.0
- Transformers: 4.47.0
- Pytorch: 2.11.0
- Datasets: 4.8.5
- Tokenizers: 0.21.0
## Citations
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e846a54cad6c6d409db9a62556eee55a43aa964ab4bbdf47abfa1764e3a05bce
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7f0595a45072a48ef2a39e9c48285f4baa60e005c41824a9e18e68b8e9c17516
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:65f86b32ed3b7242adcee3082a91ed574b2eeab859134498dc88ec53b2b1fab2
size 14709

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:49b5d602a36644e60adc641ff93b54a00173a84964edb693fdb6600fe5162714
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-1000/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c0c10af52714c7e589ca25abeb2286ad0dceeee1fb04cbc1c30570dddc7e401
size 6097

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8dd58d0f2a24a8ff3f70c6c8bc224cb52c6fdb2fa33572bf5127aeecd9960a1e
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2dfd3aba761eb1d476518b2beb818b4e25149b69d068f2ccfcd960b8f2d688d9
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:365f8a423da31f8c20d9364ca3036c4f82671bfafe365ac6078584dc9be85411
size 14709

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cdf706bbb7bb6e14789b8a64f9da210a0ebd06388777b0fcde0c23247b8fa573
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-1500/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c0c10af52714c7e589ca25abeb2286ad0dceeee1fb04cbc1c30570dddc7e401
size 6097

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2f133496b52355293fa9ac29ee5ec75b96582e0f321fbe7878389015231bc268
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b7f0163cec4555960541c1ff769ae27849885bc02a044f3ba0af0516310eb54e
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0b545d50dbe15e8f9faef72ff81a2ef81dc60f3d447457f55eb7991daec2c206
size 14709

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:012c90d7670b2bb998827d13327927d9a7c2889b9259d1469026aac83c30468b
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-2000/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c0c10af52714c7e589ca25abeb2286ad0dceeee1fb04cbc1c30570dddc7e401
size 6097

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7138abf68edeb0be9a616bf0b4e5a0ffc7aa4205702b071be7df9cee154d2b19
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:62925f3e35028da16f9002c585b1cb83031ccda2281e47974c45f7dd55127eec
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b61fbd5e3a9fc3dc328e568e34a3b8450f011c88345e8c0b5af1485cebb40f65
size 14709

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b6b3ad5549d5099e5cee60aa161fa531e77bd60d13a0dc32433f3e6e01a85319
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-2500/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c0c10af52714c7e589ca25abeb2286ad0dceeee1fb04cbc1c30570dddc7e401
size 6097

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4a98e7217dc2306d6438a2dd8970c511a3b2b30a55db7de559e85c4c52d7a6c9
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:74a3a1ae0710dd361e72957fdef073c78077cb389f8ee73aace415faa66ca8d0
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:904585a415c522ce7da727f24bcf83a9446e743c2636c1fe1b7c25d6c96b19a7
size 14709

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2dc70e9e1c1fef46d69e152f6d8d91fd0c8ce65b2fe905d341de3ee92357d097
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-3000/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c0c10af52714c7e589ca25abeb2286ad0dceeee1fb04cbc1c30570dddc7e401
size 6097

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3fbb05651b50d9a190bf8da6f34c5ef3d5ed5ae2f3304db65e34a605d638e085
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f0f0447cfd9f93707df687824e90c5496e85cf3242fa92cdba3d2d78131dd236
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:99abb9fc90b0ec0776502b0134f6c3905a60e5063686890822eb2b78d57cfdbf
size 14709

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f6d9a470b97058ccd4f7c214a2a15352732f3be5686d8ebcb2c85f286ab0b593
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-3500/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c0c10af52714c7e589ca25abeb2286ad0dceeee1fb04cbc1c30570dddc7e401
size 6097

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:73ed2d0f7766930212dd07dc1bf66780c5c73c11df9cad56e94377f3ddb76442
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c8735afbad6ecf7df610c7556a16a2bf0a513a46ddc32cb5f375a1f0abff0caa
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:abae1b137be0185f5b8dbe86701f6f843658b467348e5b0e5274b7bc7e8aa957
size 14709

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a59cc45fece372941e082c6cabb8944243288549677e1e376f4e0b1009f0e7ad
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-4000/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c0c10af52714c7e589ca25abeb2286ad0dceeee1fb04cbc1c30570dddc7e401
size 6097

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:acbbe6277052c4026e6a1969607c730ffba6fb43d85f7aa381cdd9b8f5589236
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c0fd7e9d9ae16244487dab68433cc9603944d62a966f4eb92fd608800d49a194
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:23daa357b081f02cc70a0ca4eeb47d9429da53ff82ac5ad4489d9ae8a924e806
size 14709

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2f243d020d2631ea115c3f5d3e6c1e5c9a7f8334b45c5f10056444d8c3da615c
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-500/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,733 @@
{
"best_metric": null,
"best_model_checkpoint": null,
"epoch": 0.0653893938403191,
"eval_steps": 500,
"global_step": 500,
"is_hyper_param_search": false,
"is_local_process_zero": true,
"is_world_process_zero": true,
"log_history": [
{
"epoch": 0.000653893938403191,
"grad_norm": 8.4375,
"learning_rate": 1e-05,
"loss": 10.9218,
"step": 5
},
{
"epoch": 0.001307787876806382,
"grad_norm": 8.5,
"learning_rate": 2e-05,
"loss": 10.86,
"step": 10
},
{
"epoch": 0.001961681815209573,
"grad_norm": 5.0625,
"learning_rate": 3e-05,
"loss": 10.6568,
"step": 15
},
{
"epoch": 0.002615575753612764,
"grad_norm": 3.4375,
"learning_rate": 4e-05,
"loss": 10.4622,
"step": 20
},
{
"epoch": 0.003269469692015955,
"grad_norm": 3.0625,
"learning_rate": 5e-05,
"loss": 10.2882,
"step": 25
},
{
"epoch": 0.003923363630419146,
"grad_norm": 3.09375,
"learning_rate": 6e-05,
"loss": 10.2014,
"step": 30
},
{
"epoch": 0.004577257568822337,
"grad_norm": 2.703125,
"learning_rate": 7.000000000000001e-05,
"loss": 10.0942,
"step": 35
},
{
"epoch": 0.005231151507225528,
"grad_norm": 2.984375,
"learning_rate": 8e-05,
"loss": 9.9471,
"step": 40
},
{
"epoch": 0.005885045445628719,
"grad_norm": 2.703125,
"learning_rate": 8.999999999999999e-05,
"loss": 9.7595,
"step": 45
},
{
"epoch": 0.00653893938403191,
"grad_norm": 2.46875,
"learning_rate": 0.0001,
"loss": 9.6038,
"step": 50
},
{
"epoch": 0.007192833322435101,
"grad_norm": 2.53125,
"learning_rate": 0.00011,
"loss": 9.3898,
"step": 55
},
{
"epoch": 0.007846727260838291,
"grad_norm": 2.125,
"learning_rate": 0.00012,
"loss": 9.2079,
"step": 60
},
{
"epoch": 0.008500621199241483,
"grad_norm": 1.8046875,
"learning_rate": 0.00013000000000000002,
"loss": 9.0458,
"step": 65
},
{
"epoch": 0.009154515137644674,
"grad_norm": 1.734375,
"learning_rate": 0.00014000000000000001,
"loss": 8.8588,
"step": 70
},
{
"epoch": 0.009808409076047865,
"grad_norm": 1.34375,
"learning_rate": 0.00015,
"loss": 8.7008,
"step": 75
},
{
"epoch": 0.010462303014451056,
"grad_norm": 1.109375,
"learning_rate": 0.00016,
"loss": 8.5939,
"step": 80
},
{
"epoch": 0.011116196952854247,
"grad_norm": 1.1640625,
"learning_rate": 0.00017,
"loss": 8.5616,
"step": 85
},
{
"epoch": 0.011770090891257438,
"grad_norm": 1.3984375,
"learning_rate": 0.00017999999999999998,
"loss": 8.4525,
"step": 90
},
{
"epoch": 0.01242398482966063,
"grad_norm": 1.109375,
"learning_rate": 0.00019,
"loss": 8.4594,
"step": 95
},
{
"epoch": 0.01307787876806382,
"grad_norm": 1.1640625,
"learning_rate": 0.0002,
"loss": 8.4348,
"step": 100
},
{
"epoch": 0.013731772706467011,
"grad_norm": 1.4375,
"learning_rate": 0.00021,
"loss": 8.4187,
"step": 105
},
{
"epoch": 0.014385666644870202,
"grad_norm": 1.78125,
"learning_rate": 0.00022,
"loss": 8.3887,
"step": 110
},
{
"epoch": 0.015039560583273394,
"grad_norm": 1.375,
"learning_rate": 0.00023,
"loss": 8.3935,
"step": 115
},
{
"epoch": 0.015693454521676583,
"grad_norm": 1.4453125,
"learning_rate": 0.00024,
"loss": 8.3468,
"step": 120
},
{
"epoch": 0.016347348460079774,
"grad_norm": 1.5859375,
"learning_rate": 0.00025,
"loss": 8.3648,
"step": 125
},
{
"epoch": 0.017001242398482965,
"grad_norm": 1.8046875,
"learning_rate": 0.00026000000000000003,
"loss": 8.3006,
"step": 130
},
{
"epoch": 0.017655136336886156,
"grad_norm": 1.421875,
"learning_rate": 0.00027,
"loss": 8.2614,
"step": 135
},
{
"epoch": 0.018309030275289347,
"grad_norm": 1.3828125,
"learning_rate": 0.00028000000000000003,
"loss": 8.25,
"step": 140
},
{
"epoch": 0.01896292421369254,
"grad_norm": 1.90625,
"learning_rate": 0.00029,
"loss": 8.2765,
"step": 145
},
{
"epoch": 0.01961681815209573,
"grad_norm": 1.4296875,
"learning_rate": 0.0003,
"loss": 8.2609,
"step": 150
},
{
"epoch": 0.02027071209049892,
"grad_norm": 1.703125,
"learning_rate": 0.00031,
"loss": 8.1702,
"step": 155
},
{
"epoch": 0.02092460602890211,
"grad_norm": 1.53125,
"learning_rate": 0.00032,
"loss": 8.1875,
"step": 160
},
{
"epoch": 0.021578499967305303,
"grad_norm": 1.5,
"learning_rate": 0.00033,
"loss": 8.1725,
"step": 165
},
{
"epoch": 0.022232393905708494,
"grad_norm": 1.9140625,
"learning_rate": 0.00034,
"loss": 8.1809,
"step": 170
},
{
"epoch": 0.022886287844111685,
"grad_norm": 1.5703125,
"learning_rate": 0.00035,
"loss": 8.1357,
"step": 175
},
{
"epoch": 0.023540181782514876,
"grad_norm": 1.453125,
"learning_rate": 0.00035999999999999997,
"loss": 8.1805,
"step": 180
},
{
"epoch": 0.024194075720918067,
"grad_norm": 1.84375,
"learning_rate": 0.00037,
"loss": 8.0948,
"step": 185
},
{
"epoch": 0.02484796965932126,
"grad_norm": 1.65625,
"learning_rate": 0.00038,
"loss": 8.0769,
"step": 190
},
{
"epoch": 0.02550186359772445,
"grad_norm": 1.4375,
"learning_rate": 0.00039000000000000005,
"loss": 8.0623,
"step": 195
},
{
"epoch": 0.02615575753612764,
"grad_norm": 1.375,
"learning_rate": 0.0004,
"loss": 8.0233,
"step": 200
},
{
"epoch": 0.02680965147453083,
"grad_norm": 1.65625,
"learning_rate": 0.00041,
"loss": 8.0448,
"step": 205
},
{
"epoch": 0.027463545412934023,
"grad_norm": 1.421875,
"learning_rate": 0.00042,
"loss": 7.9827,
"step": 210
},
{
"epoch": 0.028117439351337214,
"grad_norm": 1.8125,
"learning_rate": 0.00043,
"loss": 7.9879,
"step": 215
},
{
"epoch": 0.028771333289740405,
"grad_norm": 1.484375,
"learning_rate": 0.00044,
"loss": 7.9859,
"step": 220
},
{
"epoch": 0.029425227228143596,
"grad_norm": 1.40625,
"learning_rate": 0.00045000000000000004,
"loss": 7.9821,
"step": 225
},
{
"epoch": 0.030079121166546787,
"grad_norm": 1.421875,
"learning_rate": 0.00046,
"loss": 7.9357,
"step": 230
},
{
"epoch": 0.030733015104949978,
"grad_norm": 1.375,
"learning_rate": 0.00047,
"loss": 7.9613,
"step": 235
},
{
"epoch": 0.031386909043353166,
"grad_norm": 1.96875,
"learning_rate": 0.00048,
"loss": 7.8928,
"step": 240
},
{
"epoch": 0.03204080298175636,
"grad_norm": 1.5859375,
"learning_rate": 0.00049,
"loss": 7.9647,
"step": 245
},
{
"epoch": 0.03269469692015955,
"grad_norm": 1.6875,
"learning_rate": 0.0005,
"loss": 7.8866,
"step": 250
},
{
"epoch": 0.03334859085856274,
"grad_norm": 1.5703125,
"learning_rate": 0.00051,
"loss": 7.8719,
"step": 255
},
{
"epoch": 0.03400248479696593,
"grad_norm": 1.4609375,
"learning_rate": 0.0005200000000000001,
"loss": 7.8036,
"step": 260
},
{
"epoch": 0.03465637873536912,
"grad_norm": 1.53125,
"learning_rate": 0.0005300000000000001,
"loss": 7.8523,
"step": 265
},
{
"epoch": 0.03531027267377231,
"grad_norm": 1.46875,
"learning_rate": 0.00054,
"loss": 7.8359,
"step": 270
},
{
"epoch": 0.035964166612175504,
"grad_norm": 1.65625,
"learning_rate": 0.00055,
"loss": 7.8399,
"step": 275
},
{
"epoch": 0.036618060550578695,
"grad_norm": 1.65625,
"learning_rate": 0.0005600000000000001,
"loss": 7.8428,
"step": 280
},
{
"epoch": 0.037271954488981886,
"grad_norm": 1.7421875,
"learning_rate": 0.00057,
"loss": 7.7551,
"step": 285
},
{
"epoch": 0.03792584842738508,
"grad_norm": 1.4765625,
"learning_rate": 0.00058,
"loss": 7.7357,
"step": 290
},
{
"epoch": 0.03857974236578827,
"grad_norm": 1.375,
"learning_rate": 0.00059,
"loss": 7.7327,
"step": 295
},
{
"epoch": 0.03923363630419146,
"grad_norm": 1.5859375,
"learning_rate": 0.0006,
"loss": 7.6911,
"step": 300
},
{
"epoch": 0.03988753024259465,
"grad_norm": 1.3359375,
"learning_rate": 0.00061,
"loss": 7.6854,
"step": 305
},
{
"epoch": 0.04054142418099784,
"grad_norm": 1.4921875,
"learning_rate": 0.00062,
"loss": 7.7088,
"step": 310
},
{
"epoch": 0.04119531811940103,
"grad_norm": 1.59375,
"learning_rate": 0.00063,
"loss": 7.6666,
"step": 315
},
{
"epoch": 0.04184921205780422,
"grad_norm": 1.4921875,
"learning_rate": 0.00064,
"loss": 7.6758,
"step": 320
},
{
"epoch": 0.042503105996207415,
"grad_norm": 1.46875,
"learning_rate": 0.0006500000000000001,
"loss": 7.6554,
"step": 325
},
{
"epoch": 0.043156999934610606,
"grad_norm": 1.6328125,
"learning_rate": 0.00066,
"loss": 7.6471,
"step": 330
},
{
"epoch": 0.0438108938730138,
"grad_norm": 1.546875,
"learning_rate": 0.00067,
"loss": 7.612,
"step": 335
},
{
"epoch": 0.04446478781141699,
"grad_norm": 1.546875,
"learning_rate": 0.00068,
"loss": 7.6353,
"step": 340
},
{
"epoch": 0.04511868174982018,
"grad_norm": 1.5859375,
"learning_rate": 0.00069,
"loss": 7.6068,
"step": 345
},
{
"epoch": 0.04577257568822337,
"grad_norm": 1.640625,
"learning_rate": 0.0007,
"loss": 7.6507,
"step": 350
},
{
"epoch": 0.04642646962662656,
"grad_norm": 1.6328125,
"learning_rate": 0.00071,
"loss": 7.5806,
"step": 355
},
{
"epoch": 0.04708036356502975,
"grad_norm": 1.6328125,
"learning_rate": 0.0007199999999999999,
"loss": 7.5964,
"step": 360
},
{
"epoch": 0.04773425750343294,
"grad_norm": 1.484375,
"learning_rate": 0.00073,
"loss": 7.5817,
"step": 365
},
{
"epoch": 0.048388151441836134,
"grad_norm": 1.4375,
"learning_rate": 0.00074,
"loss": 7.5673,
"step": 370
},
{
"epoch": 0.049042045380239326,
"grad_norm": 1.6015625,
"learning_rate": 0.00075,
"loss": 7.5662,
"step": 375
},
{
"epoch": 0.04969593931864252,
"grad_norm": 1.5234375,
"learning_rate": 0.00076,
"loss": 7.5739,
"step": 380
},
{
"epoch": 0.05034983325704571,
"grad_norm": 1.6328125,
"learning_rate": 0.0007700000000000001,
"loss": 7.5809,
"step": 385
},
{
"epoch": 0.0510037271954489,
"grad_norm": 1.6328125,
"learning_rate": 0.0007800000000000001,
"loss": 7.584,
"step": 390
},
{
"epoch": 0.05165762113385209,
"grad_norm": 1.8671875,
"learning_rate": 0.00079,
"loss": 7.4952,
"step": 395
},
{
"epoch": 0.05231151507225528,
"grad_norm": 1.6328125,
"learning_rate": 0.0008,
"loss": 7.5619,
"step": 400
},
{
"epoch": 0.05296540901065847,
"grad_norm": 1.4609375,
"learning_rate": 0.0008100000000000001,
"loss": 7.5143,
"step": 405
},
{
"epoch": 0.05361930294906166,
"grad_norm": 1.59375,
"learning_rate": 0.00082,
"loss": 7.429,
"step": 410
},
{
"epoch": 0.054273196887464854,
"grad_norm": 1.5625,
"learning_rate": 0.00083,
"loss": 7.5304,
"step": 415
},
{
"epoch": 0.054927090825868045,
"grad_norm": 1.515625,
"learning_rate": 0.00084,
"loss": 7.4544,
"step": 420
},
{
"epoch": 0.055580984764271237,
"grad_norm": 1.5546875,
"learning_rate": 0.00085,
"loss": 7.4307,
"step": 425
},
{
"epoch": 0.05623487870267443,
"grad_norm": 1.5078125,
"learning_rate": 0.00086,
"loss": 7.4134,
"step": 430
},
{
"epoch": 0.05688877264107762,
"grad_norm": 1.4765625,
"learning_rate": 0.00087,
"loss": 7.5005,
"step": 435
},
{
"epoch": 0.05754266657948081,
"grad_norm": 1.421875,
"learning_rate": 0.00088,
"loss": 7.4401,
"step": 440
},
{
"epoch": 0.058196560517884,
"grad_norm": 1.671875,
"learning_rate": 0.0008900000000000001,
"loss": 7.4792,
"step": 445
},
{
"epoch": 0.05885045445628719,
"grad_norm": 1.5,
"learning_rate": 0.0009000000000000001,
"loss": 7.405,
"step": 450
},
{
"epoch": 0.05950434839469038,
"grad_norm": 1.5625,
"learning_rate": 0.00091,
"loss": 7.4043,
"step": 455
},
{
"epoch": 0.060158242333093574,
"grad_norm": 1.5,
"learning_rate": 0.00092,
"loss": 7.4356,
"step": 460
},
{
"epoch": 0.060812136271496765,
"grad_norm": 1.4453125,
"learning_rate": 0.00093,
"loss": 7.2855,
"step": 465
},
{
"epoch": 0.061466030209899956,
"grad_norm": 1.6015625,
"learning_rate": 0.00094,
"loss": 7.3614,
"step": 470
},
{
"epoch": 0.06211992414830315,
"grad_norm": 1.6171875,
"learning_rate": 0.00095,
"loss": 7.4136,
"step": 475
},
{
"epoch": 0.06277381808670633,
"grad_norm": 1.5546875,
"learning_rate": 0.00096,
"loss": 7.3571,
"step": 480
},
{
"epoch": 0.06342771202510952,
"grad_norm": 1.6015625,
"learning_rate": 0.0009699999999999999,
"loss": 7.3855,
"step": 485
},
{
"epoch": 0.06408160596351271,
"grad_norm": 1.6875,
"learning_rate": 0.00098,
"loss": 7.5011,
"step": 490
},
{
"epoch": 0.0647354999019159,
"grad_norm": 1.578125,
"learning_rate": 0.00099,
"loss": 7.3837,
"step": 495
},
{
"epoch": 0.0653893938403191,
"grad_norm": 1.5859375,
"learning_rate": 0.001,
"loss": 7.2856,
"step": 500
}
],
"logging_steps": 5,
"max_steps": 4000,
"num_input_tokens_seen": 0,
"num_train_epochs": 1,
"save_steps": 500,
"stateful_callbacks": {
"TrainerControl": {
"args": {
"should_epoch_stop": false,
"should_evaluate": false,
"should_log": false,
"should_save": true,
"should_training_stop": false
},
"attributes": {}
}
},
"total_flos": 675529741762560.0,
"train_batch_size": 32,
"trial_name": null,
"trial_params": null
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c0c10af52714c7e589ca25abeb2286ad0dceeee1fb04cbc1c30570dddc7e401
size 6097

35
config.json Normal file
View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:73ed2d0f7766930212dd07dc1bf66780c5c73c11df9cad56e94377f3ddb76442
size 79752272

1249
special_tokens_map.json Normal file

File diff suppressed because it is too large Load Diff

210940
tokenizer.json Normal file

File diff suppressed because one or more lines are too long

10829
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c0c10af52714c7e589ca25abeb2286ad0dceeee1fb04cbc1c30570dddc7e401
size 6097