初始化项目,由ModelHub XC社区提供模型

Model: fpadovani/tur_indomain_prepretraining_seed3407
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-23 01:25:27 +08:00
commit 1d9490986c
97 changed files with 2033223 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

58
README.md Normal file
View File

@@ -0,0 +1,58 @@
---
base_model: goldfish-models/tur_latn_10mb
library_name: transformers
model_name: tur_indomain_prepretraining_seed3407
tags:
- generated_from_trainer
- trl
- sft
licence: license
---
# Model Card for tur_indomain_prepretraining_seed3407
This model is a fine-tuned version of [goldfish-models/tur_latn_10mb](https://huggingface.co/goldfish-models/tur_latn_10mb).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="fpadovani/tur_indomain_prepretraining_seed3407", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/f-padovani-university-of-groningen/formal_lang_tur/runs/nyk5nhek)
This model was trained with SFT.
### Framework versions
- TRL: 0.13.0
- Transformers: 4.47.0
- Pytorch: 2.11.0
- Datasets: 4.8.5
- Tokenizers: 0.21.0
## Citations
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:28071939992f32f70cb911bc88dcf731d862f9f77b6c1d8e456e57d66f9df657
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:898a9a9713283a1b2a84725a94c18eeba429ddc17c0b08ae90f576b6b31ff8e6
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:39828d7aa6387407dd27da37f4ef2dd8fa0bd729e3310ad35ef80ba99332ee98
size 14645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:49b5d602a36644e60adc641ff93b54a00173a84964edb693fdb6600fe5162714
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-1000/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:120b7610d3d1345042d77ff76b16dab74d2f02cdf6ceb125b23e2f467511a29e
size 6161

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:148583d3e34316986df259bc7da6f97a87ba4d5736ddcbd58823592cede1a53a
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:92359054452a3b76f2653858d44ae654c6a0b1b2ea578de2ea10f533a5c2bf1f
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cf6d8f94653e93dac341ab8e68e0a7bd9aa8ccd631fcf69dd0ab638e40b1484b
size 14645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cdf706bbb7bb6e14789b8a64f9da210a0ebd06388777b0fcde0c23247b8fa573
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-1500/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:120b7610d3d1345042d77ff76b16dab74d2f02cdf6ceb125b23e2f467511a29e
size 6161

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bebbfc48bb6182518209fe2988fe25055c6200b7ae7ed8a25410b3ff1a0be959
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bc9304070199dc71fc75233ca175cc82ebb9654a6d97b3ef44b905d13c265d63
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7c83711fa9302bf652742b38786bcb9ef5f57dd99e47541d8d80ce8cc0403d7c
size 14645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:012c90d7670b2bb998827d13327927d9a7c2889b9259d1469026aac83c30468b
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-2000/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:120b7610d3d1345042d77ff76b16dab74d2f02cdf6ceb125b23e2f467511a29e
size 6161

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f55b3c7560285eab88bbf6090d348f035f01f784779ee5de2c7f22f0f6b0705f
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a063db2241584235b89385f9be205a5bc929c5b1d3aa2e9fde064593f193a944
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:51beeb27292efe4853f1ab66fcbc42c3c1dc5474e3a3aa14e5c380ee2239615f
size 14645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b6b3ad5549d5099e5cee60aa161fa531e77bd60d13a0dc32433f3e6e01a85319
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-2500/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:120b7610d3d1345042d77ff76b16dab74d2f02cdf6ceb125b23e2f467511a29e
size 6161

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ba6f9ab198e2ca31f23b0b501bd332c437aa0e61ff0b9c67703ffaeb8200a2c2
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cb543e56f4872432ef867b574ac600585340864f0ff6966339326c129ac94bcf
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0af1d055b69cb9fe3b1afd9d609b6ee2bf8fd22961d344388d4359e48ae4b38c
size 14645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2dc70e9e1c1fef46d69e152f6d8d91fd0c8ce65b2fe905d341de3ee92357d097
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-3000/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:120b7610d3d1345042d77ff76b16dab74d2f02cdf6ceb125b23e2f467511a29e
size 6161

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:11f654435e3d2203a447797f361c5938b4c6fd9b1f3a83808daa842d12086ed1
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7db3440d4acd2f064360f902ea415a09e44751a6b9e53938dab47f8ff27eb7e9
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1690dd7690242ee3b629cfb3534e2e9b4c5f25cca73a0c73a64d442db9459d52
size 14645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f6d9a470b97058ccd4f7c214a2a15352732f3be5686d8ebcb2c85f286ab0b593
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-3500/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:120b7610d3d1345042d77ff76b16dab74d2f02cdf6ceb125b23e2f467511a29e
size 6161

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:74374b5c8c69325ec98239c451d3429d54d01a5b32287ef8a8e97b4f248a63a1
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5e35187bc21a4199c92b7cde7ad7d2c89b81be3f709d3709a806fcefbcadee7b
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:83bd91bff278654cdf9c13bb4a8596b81f4dad1de550ccb15d727a5f7755558d
size 14645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a59cc45fece372941e082c6cabb8944243288549677e1e376f4e0b1009f0e7ad
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-4000/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:120b7610d3d1345042d77ff76b16dab74d2f02cdf6ceb125b23e2f467511a29e
size 6161

View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8742678d0dcc621e0e8c9ae7edbf26903bc21e1e23f9edeea1aecbfd540953f2
size 79752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d748618103a32299597458c6aa40cd36ef63e5d440da16b1209a45ab7f74bcf9
size 159538443

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:194eb78f36fb5ce05028e7ddd5f750735454226bad2f2bdd0a64360e9883a03c
size 14645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2f243d020d2631ea115c3f5d3e6c1e5c9a7f8334b45c5f10056444d8c3da615c
size 1465

File diff suppressed because it is too large Load Diff

210940
checkpoint-500/tokenizer.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,733 @@
{
"best_metric": null,
"best_model_checkpoint": null,
"epoch": 0.0653893938403191,
"eval_steps": 500,
"global_step": 500,
"is_hyper_param_search": false,
"is_local_process_zero": true,
"is_world_process_zero": true,
"log_history": [
{
"epoch": 0.000653893938403191,
"grad_norm": 8.1875,
"learning_rate": 1e-05,
"loss": 10.9496,
"step": 5
},
{
"epoch": 0.001307787876806382,
"grad_norm": 7.6875,
"learning_rate": 2e-05,
"loss": 10.8725,
"step": 10
},
{
"epoch": 0.001961681815209573,
"grad_norm": 5.09375,
"learning_rate": 3e-05,
"loss": 10.6675,
"step": 15
},
{
"epoch": 0.002615575753612764,
"grad_norm": 3.609375,
"learning_rate": 4e-05,
"loss": 10.4669,
"step": 20
},
{
"epoch": 0.003269469692015955,
"grad_norm": 3.296875,
"learning_rate": 5e-05,
"loss": 10.315,
"step": 25
},
{
"epoch": 0.003923363630419146,
"grad_norm": 3.21875,
"learning_rate": 6e-05,
"loss": 10.1836,
"step": 30
},
{
"epoch": 0.004577257568822337,
"grad_norm": 2.875,
"learning_rate": 7.000000000000001e-05,
"loss": 10.0898,
"step": 35
},
{
"epoch": 0.005231151507225528,
"grad_norm": 2.859375,
"learning_rate": 8e-05,
"loss": 9.9304,
"step": 40
},
{
"epoch": 0.005885045445628719,
"grad_norm": 2.53125,
"learning_rate": 8.999999999999999e-05,
"loss": 9.7707,
"step": 45
},
{
"epoch": 0.00653893938403191,
"grad_norm": 2.65625,
"learning_rate": 0.0001,
"loss": 9.5651,
"step": 50
},
{
"epoch": 0.007192833322435101,
"grad_norm": 2.21875,
"learning_rate": 0.00011,
"loss": 9.3713,
"step": 55
},
{
"epoch": 0.007846727260838291,
"grad_norm": 2.15625,
"learning_rate": 0.00012,
"loss": 9.1757,
"step": 60
},
{
"epoch": 0.008500621199241483,
"grad_norm": 1.890625,
"learning_rate": 0.00013000000000000002,
"loss": 9.0253,
"step": 65
},
{
"epoch": 0.009154515137644674,
"grad_norm": 1.515625,
"learning_rate": 0.00014000000000000001,
"loss": 8.8424,
"step": 70
},
{
"epoch": 0.009808409076047865,
"grad_norm": 1.4375,
"learning_rate": 0.00015,
"loss": 8.686,
"step": 75
},
{
"epoch": 0.010462303014451056,
"grad_norm": 1.265625,
"learning_rate": 0.00016,
"loss": 8.5556,
"step": 80
},
{
"epoch": 0.011116196952854247,
"grad_norm": 1.3046875,
"learning_rate": 0.00017,
"loss": 8.5653,
"step": 85
},
{
"epoch": 0.011770090891257438,
"grad_norm": 1.21875,
"learning_rate": 0.00017999999999999998,
"loss": 8.4762,
"step": 90
},
{
"epoch": 0.01242398482966063,
"grad_norm": 1.234375,
"learning_rate": 0.00019,
"loss": 8.4341,
"step": 95
},
{
"epoch": 0.01307787876806382,
"grad_norm": 1.4375,
"learning_rate": 0.0002,
"loss": 8.4259,
"step": 100
},
{
"epoch": 0.013731772706467011,
"grad_norm": 1.234375,
"learning_rate": 0.00021,
"loss": 8.4247,
"step": 105
},
{
"epoch": 0.014385666644870202,
"grad_norm": 1.7578125,
"learning_rate": 0.00022,
"loss": 8.4139,
"step": 110
},
{
"epoch": 0.015039560583273394,
"grad_norm": 1.6796875,
"learning_rate": 0.00023,
"loss": 8.3534,
"step": 115
},
{
"epoch": 0.015693454521676583,
"grad_norm": 1.9140625,
"learning_rate": 0.00024,
"loss": 8.3723,
"step": 120
},
{
"epoch": 0.016347348460079774,
"grad_norm": 1.6484375,
"learning_rate": 0.00025,
"loss": 8.266,
"step": 125
},
{
"epoch": 0.017001242398482965,
"grad_norm": 1.3125,
"learning_rate": 0.00026000000000000003,
"loss": 8.3037,
"step": 130
},
{
"epoch": 0.017655136336886156,
"grad_norm": 1.734375,
"learning_rate": 0.00027,
"loss": 8.2941,
"step": 135
},
{
"epoch": 0.018309030275289347,
"grad_norm": 1.5546875,
"learning_rate": 0.00028000000000000003,
"loss": 8.3108,
"step": 140
},
{
"epoch": 0.01896292421369254,
"grad_norm": 1.53125,
"learning_rate": 0.00029,
"loss": 8.2322,
"step": 145
},
{
"epoch": 0.01961681815209573,
"grad_norm": 1.703125,
"learning_rate": 0.0003,
"loss": 8.2601,
"step": 150
},
{
"epoch": 0.02027071209049892,
"grad_norm": 1.75,
"learning_rate": 0.00031,
"loss": 8.2087,
"step": 155
},
{
"epoch": 0.02092460602890211,
"grad_norm": 1.46875,
"learning_rate": 0.00032,
"loss": 8.3123,
"step": 160
},
{
"epoch": 0.021578499967305303,
"grad_norm": 1.5625,
"learning_rate": 0.00033,
"loss": 8.1831,
"step": 165
},
{
"epoch": 0.022232393905708494,
"grad_norm": 1.421875,
"learning_rate": 0.00034,
"loss": 8.1369,
"step": 170
},
{
"epoch": 0.022886287844111685,
"grad_norm": 1.421875,
"learning_rate": 0.00035,
"loss": 8.1271,
"step": 175
},
{
"epoch": 0.023540181782514876,
"grad_norm": 1.328125,
"learning_rate": 0.00035999999999999997,
"loss": 8.1362,
"step": 180
},
{
"epoch": 0.024194075720918067,
"grad_norm": 1.4375,
"learning_rate": 0.00037,
"loss": 8.1593,
"step": 185
},
{
"epoch": 0.02484796965932126,
"grad_norm": 1.421875,
"learning_rate": 0.00038,
"loss": 8.0507,
"step": 190
},
{
"epoch": 0.02550186359772445,
"grad_norm": 1.7890625,
"learning_rate": 0.00039000000000000005,
"loss": 8.0434,
"step": 195
},
{
"epoch": 0.02615575753612764,
"grad_norm": 1.4140625,
"learning_rate": 0.0004,
"loss": 8.0434,
"step": 200
},
{
"epoch": 0.02680965147453083,
"grad_norm": 1.515625,
"learning_rate": 0.00041,
"loss": 8.0618,
"step": 205
},
{
"epoch": 0.027463545412934023,
"grad_norm": 1.3515625,
"learning_rate": 0.00042,
"loss": 8.0447,
"step": 210
},
{
"epoch": 0.028117439351337214,
"grad_norm": 1.59375,
"learning_rate": 0.00043,
"loss": 8.0226,
"step": 215
},
{
"epoch": 0.028771333289740405,
"grad_norm": 1.40625,
"learning_rate": 0.00044,
"loss": 7.9504,
"step": 220
},
{
"epoch": 0.029425227228143596,
"grad_norm": 1.359375,
"learning_rate": 0.00045000000000000004,
"loss": 7.9587,
"step": 225
},
{
"epoch": 0.030079121166546787,
"grad_norm": 1.4453125,
"learning_rate": 0.00046,
"loss": 7.9141,
"step": 230
},
{
"epoch": 0.030733015104949978,
"grad_norm": 1.59375,
"learning_rate": 0.00047,
"loss": 7.923,
"step": 235
},
{
"epoch": 0.031386909043353166,
"grad_norm": 1.4140625,
"learning_rate": 0.00048,
"loss": 7.9591,
"step": 240
},
{
"epoch": 0.03204080298175636,
"grad_norm": 1.6640625,
"learning_rate": 0.00049,
"loss": 7.9369,
"step": 245
},
{
"epoch": 0.03269469692015955,
"grad_norm": 1.390625,
"learning_rate": 0.0005,
"loss": 7.9401,
"step": 250
},
{
"epoch": 0.03334859085856274,
"grad_norm": 1.7109375,
"learning_rate": 0.00051,
"loss": 7.9218,
"step": 255
},
{
"epoch": 0.03400248479696593,
"grad_norm": 1.5703125,
"learning_rate": 0.0005200000000000001,
"loss": 7.9144,
"step": 260
},
{
"epoch": 0.03465637873536912,
"grad_norm": 1.421875,
"learning_rate": 0.0005300000000000001,
"loss": 7.8008,
"step": 265
},
{
"epoch": 0.03531027267377231,
"grad_norm": 1.7578125,
"learning_rate": 0.00054,
"loss": 7.8906,
"step": 270
},
{
"epoch": 0.035964166612175504,
"grad_norm": 1.6171875,
"learning_rate": 0.00055,
"loss": 7.8308,
"step": 275
},
{
"epoch": 0.036618060550578695,
"grad_norm": 1.5859375,
"learning_rate": 0.0005600000000000001,
"loss": 7.8048,
"step": 280
},
{
"epoch": 0.037271954488981886,
"grad_norm": 1.7578125,
"learning_rate": 0.00057,
"loss": 7.8612,
"step": 285
},
{
"epoch": 0.03792584842738508,
"grad_norm": 1.875,
"learning_rate": 0.00058,
"loss": 7.7612,
"step": 290
},
{
"epoch": 0.03857974236578827,
"grad_norm": 1.546875,
"learning_rate": 0.00059,
"loss": 7.8016,
"step": 295
},
{
"epoch": 0.03923363630419146,
"grad_norm": 1.578125,
"learning_rate": 0.0006,
"loss": 7.7855,
"step": 300
},
{
"epoch": 0.03988753024259465,
"grad_norm": 1.4765625,
"learning_rate": 0.00061,
"loss": 7.7743,
"step": 305
},
{
"epoch": 0.04054142418099784,
"grad_norm": 1.5546875,
"learning_rate": 0.00062,
"loss": 7.7267,
"step": 310
},
{
"epoch": 0.04119531811940103,
"grad_norm": 1.8515625,
"learning_rate": 0.00063,
"loss": 7.6867,
"step": 315
},
{
"epoch": 0.04184921205780422,
"grad_norm": 1.7265625,
"learning_rate": 0.00064,
"loss": 7.7052,
"step": 320
},
{
"epoch": 0.042503105996207415,
"grad_norm": 1.484375,
"learning_rate": 0.0006500000000000001,
"loss": 7.6547,
"step": 325
},
{
"epoch": 0.043156999934610606,
"grad_norm": 1.5078125,
"learning_rate": 0.00066,
"loss": 7.5959,
"step": 330
},
{
"epoch": 0.0438108938730138,
"grad_norm": 1.4765625,
"learning_rate": 0.00067,
"loss": 7.6249,
"step": 335
},
{
"epoch": 0.04446478781141699,
"grad_norm": 1.5390625,
"learning_rate": 0.00068,
"loss": 7.6151,
"step": 340
},
{
"epoch": 0.04511868174982018,
"grad_norm": 1.515625,
"learning_rate": 0.00069,
"loss": 7.6108,
"step": 345
},
{
"epoch": 0.04577257568822337,
"grad_norm": 1.46875,
"learning_rate": 0.0007,
"loss": 7.6138,
"step": 350
},
{
"epoch": 0.04642646962662656,
"grad_norm": 1.5234375,
"learning_rate": 0.00071,
"loss": 7.6191,
"step": 355
},
{
"epoch": 0.04708036356502975,
"grad_norm": 1.515625,
"learning_rate": 0.0007199999999999999,
"loss": 7.6052,
"step": 360
},
{
"epoch": 0.04773425750343294,
"grad_norm": 1.5234375,
"learning_rate": 0.00073,
"loss": 7.5601,
"step": 365
},
{
"epoch": 0.048388151441836134,
"grad_norm": 1.5625,
"learning_rate": 0.00074,
"loss": 7.544,
"step": 370
},
{
"epoch": 0.049042045380239326,
"grad_norm": 1.65625,
"learning_rate": 0.00075,
"loss": 7.6637,
"step": 375
},
{
"epoch": 0.04969593931864252,
"grad_norm": 1.59375,
"learning_rate": 0.00076,
"loss": 7.5337,
"step": 380
},
{
"epoch": 0.05034983325704571,
"grad_norm": 1.4921875,
"learning_rate": 0.0007700000000000001,
"loss": 7.5765,
"step": 385
},
{
"epoch": 0.0510037271954489,
"grad_norm": 1.5234375,
"learning_rate": 0.0007800000000000001,
"loss": 7.5058,
"step": 390
},
{
"epoch": 0.05165762113385209,
"grad_norm": 1.546875,
"learning_rate": 0.00079,
"loss": 7.4771,
"step": 395
},
{
"epoch": 0.05231151507225528,
"grad_norm": 1.5625,
"learning_rate": 0.0008,
"loss": 7.5002,
"step": 400
},
{
"epoch": 0.05296540901065847,
"grad_norm": 1.4609375,
"learning_rate": 0.0008100000000000001,
"loss": 7.4754,
"step": 405
},
{
"epoch": 0.05361930294906166,
"grad_norm": 1.46875,
"learning_rate": 0.00082,
"loss": 7.4979,
"step": 410
},
{
"epoch": 0.054273196887464854,
"grad_norm": 1.3671875,
"learning_rate": 0.00083,
"loss": 7.4688,
"step": 415
},
{
"epoch": 0.054927090825868045,
"grad_norm": 1.6640625,
"learning_rate": 0.00084,
"loss": 7.4894,
"step": 420
},
{
"epoch": 0.055580984764271237,
"grad_norm": 1.34375,
"learning_rate": 0.00085,
"loss": 7.4744,
"step": 425
},
{
"epoch": 0.05623487870267443,
"grad_norm": 1.515625,
"learning_rate": 0.00086,
"loss": 7.4532,
"step": 430
},
{
"epoch": 0.05688877264107762,
"grad_norm": 1.4921875,
"learning_rate": 0.00087,
"loss": 7.4408,
"step": 435
},
{
"epoch": 0.05754266657948081,
"grad_norm": 1.5625,
"learning_rate": 0.00088,
"loss": 7.3857,
"step": 440
},
{
"epoch": 0.058196560517884,
"grad_norm": 1.6171875,
"learning_rate": 0.0008900000000000001,
"loss": 7.3989,
"step": 445
},
{
"epoch": 0.05885045445628719,
"grad_norm": 1.390625,
"learning_rate": 0.0009000000000000001,
"loss": 7.4182,
"step": 450
},
{
"epoch": 0.05950434839469038,
"grad_norm": 1.6328125,
"learning_rate": 0.00091,
"loss": 7.4086,
"step": 455
},
{
"epoch": 0.060158242333093574,
"grad_norm": 1.53125,
"learning_rate": 0.00092,
"loss": 7.5019,
"step": 460
},
{
"epoch": 0.060812136271496765,
"grad_norm": 1.7109375,
"learning_rate": 0.00093,
"loss": 7.3263,
"step": 465
},
{
"epoch": 0.061466030209899956,
"grad_norm": 1.359375,
"learning_rate": 0.00094,
"loss": 7.3745,
"step": 470
},
{
"epoch": 0.06211992414830315,
"grad_norm": 1.390625,
"learning_rate": 0.00095,
"loss": 7.4221,
"step": 475
},
{
"epoch": 0.06277381808670633,
"grad_norm": 1.375,
"learning_rate": 0.00096,
"loss": 7.4204,
"step": 480
},
{
"epoch": 0.06342771202510952,
"grad_norm": 1.671875,
"learning_rate": 0.0009699999999999999,
"loss": 7.3489,
"step": 485
},
{
"epoch": 0.06408160596351271,
"grad_norm": 1.5078125,
"learning_rate": 0.00098,
"loss": 7.3887,
"step": 490
},
{
"epoch": 0.0647354999019159,
"grad_norm": 1.6328125,
"learning_rate": 0.00099,
"loss": 7.3143,
"step": 495
},
{
"epoch": 0.0653893938403191,
"grad_norm": 1.515625,
"learning_rate": 0.001,
"loss": 7.271,
"step": 500
}
],
"logging_steps": 5,
"max_steps": 4000,
"num_input_tokens_seen": 0,
"num_train_epochs": 1,
"save_steps": 500,
"stateful_callbacks": {
"TrainerControl": {
"args": {
"should_epoch_stop": false,
"should_evaluate": false,
"should_log": false,
"should_save": true,
"should_training_stop": false
},
"attributes": {}
}
},
"total_flos": 660917633679360.0,
"train_batch_size": 32,
"trial_name": null,
"trial_params": null
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:120b7610d3d1345042d77ff76b16dab74d2f02cdf6ceb125b23e2f467511a29e
size 6161

35
config.json Normal file
View File

@@ -0,0 +1,35 @@
{
"_name_or_path": "goldfish-models/tur_latn_10mb",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50000,
"embd_pdrop": 0.1,
"eos_token_id": 50001,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 4,
"n_positions": 2048,
"pad_token_id": 50002,
"prefix": "[CLS]",
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.0",
"use_cache": true,
"vocab_size": 51200
}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 50000,
"eos_token_id": 50001,
"pad_token_id": 50002,
"transformers_version": "4.47.0"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:74374b5c8c69325ec98239c451d3429d54d01a5b32287ef8a8e97b4f248a63a1
size 79752272

1249
special_tokens_map.json Normal file

File diff suppressed because it is too large Load Diff

210940
tokenizer.json Normal file

File diff suppressed because one or more lines are too long

10829
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:120b7610d3d1345042d77ff76b16dab74d2f02cdf6ceb125b23e2f467511a29e
size 6161