初始化项目,由ModelHub XC社区提供模型

Model: mlfoundations-dev/oh_v1.3_gpt_4o_mini
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-22 20:13:12 +08:00
commit 1db71364d8
22 changed files with 3351 additions and 0 deletions

54
.gitattributes vendored Normal file
View File

@@ -0,0 +1,54 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
training_args.bin filter=lfs diff=lfs merge=lfs -text
model-00001-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00004-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00002-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00003-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

67
README.md Normal file
View File

@@ -0,0 +1,67 @@
---
library_name: transformers
license: llama3.1
base_model: meta-llama/Meta-Llama-3.1-8B
tags:
- llama-factory
- full
- generated_from_trainer
model-index:
- name: oh_v1.3_gpt_4o_mini
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# oh_v1.3_gpt_4o_mini
This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) on the mlfoundations-dev/oh_v1.3_gpt_4o_mini dataset.
It achieves the following results on the evaluation set:
- Loss: 0.7332
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 16
- gradient_accumulation_steps: 4
- total_train_batch_size: 512
- total_eval_batch_size: 128
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 3.0
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 0.7434 | 0.9973 | 272 | 0.7440 |
| 0.6845 | 1.9963 | 544 | 0.7305 |
| 0.6373 | 2.9954 | 816 | 0.7332 |
### Framework versions
- Transformers 4.46.1
- Pytorch 2.3.0
- Datasets 3.1.0
- Tokenizers 0.20.3

12
all_results.json Normal file
View File

@@ -0,0 +1,12 @@
{
"epoch": 2.9954170485792853,
"eval_loss": 0.7332214117050171,
"eval_runtime": 93.8096,
"eval_samples_per_second": 78.35,
"eval_steps_per_second": 0.618,
"total_flos": 1366411632967680.0,
"train_loss": 0.7035682309491962,
"train_runtime": 14220.3782,
"train_samples_per_second": 29.46,
"train_steps_per_second": 0.057
}

36
config.json Normal file
View File

@@ -0,0 +1,36 @@
{
"_name_or_path": "meta-llama/Meta-Llama-3.1-8B",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128001,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"high_freq_factor": 4.0,
"low_freq_factor": 1.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.1",
"use_cache": false,
"vocab_size": 128256
}

41
configs.yaml Normal file
View File

@@ -0,0 +1,41 @@
adam_beta1: '0.9'
adam_beta2: '0.999'
bf16: 'True'
cutoff_len: '2048'
dataset: mlfoundations-dev/oh_v1.3_gpt_4o_mini
dataset_dir: ONLINE
ddp_timeout: '180000000'
deepspeed: /opt/ml/code/zero3.json
do_train: 'True'
enable_liger_kernel: 'False'
eval_strategy: epoch
finetuning_type: full
formatting: sharegpt
global_batch_size: '512'
gradient_accumulation_steps: '4'
gradient_checkpointing: 'True'
hub_model_id: mlfoundations-dev/oh_v1.3_gpt_4o_mini
learning_rate: 5e-06
logging_steps: '10'
lr_scheduler_type: constant
max_grad_norm: '1'
messages: conversations
model_name_or_path: meta-llama/Meta-Llama-3.1-8B
neat_packing: 'True'
num_train_epochs: '3.0'
output_dir: /opt/ml/model
overwrite_cache: 'True'
overwrite_output_dir: 'True'
packing: 'True'
per_device_train_batch_size: '8'
plot_loss: 'True'
preprocessing_num_workers: '16'
push_to_db: 'True'
push_to_hub: 'True'
report_to: wandb
run_name: oh_v1.3_gpt_4o_mini
save_strategy: epoch
stage: sft
template: llama3
val_size: '0.05'
weight_decay: '0.1'

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

7
eval_results.json Normal file
View File

@@ -0,0 +1,7 @@
{
"epoch": 2.9954170485792853,
"eval_loss": 0.7332214117050171,
"eval_runtime": 93.8096,
"eval_samples_per_second": 78.35,
"eval_steps_per_second": 0.618
}

9
generation_config.json Normal file
View File

@@ -0,0 +1,9 @@
{
"_from_model_config": true,
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": 128001,
"temperature": 0.6,
"top_p": 0.9,
"transformers_version": "4.46.1"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ad255bcedbb511599e1bedcd8e8cf9305c27d989004329da6b8bd3aed7b4798c
size 4976698672

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4dc401136cf31c6ddcf5d956214a10e191aa405943de5d19951137c896f0cb36
size 4999802720

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4f60161d27bb92605fc5a38fd8e339f080a37c4089e970a937871f8080e728e9
size 4915916176

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1f67b28bcfcc5ca34f746f705c84dd049735f917a0b64e79b8a9e9ec9041f33f
size 1168138808

View File

@@ -0,0 +1,298 @@
{
"metadata": {
"total_size": 16060522496
},
"weight_map": {
"lm_head.weight": "model-00004-of-00004.safetensors",
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.norm.weight": "model-00004-of-00004.safetensors"
}
}

17
special_tokens_map.json Normal file
View File

@@ -0,0 +1,17 @@
{
"bos_token": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|eot_id|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": "<|eot_id|>"
}

BIN
tokenizer.json (Stored with Git LFS) Normal file

Binary file not shown.

2065
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff

8
train_results.json Normal file
View File

@@ -0,0 +1,8 @@
{
"epoch": 2.9954170485792853,
"total_flos": 1366411632967680.0,
"train_loss": 0.7035682309491962,
"train_runtime": 14220.3782,
"train_samples_per_second": 29.46,
"train_steps_per_second": 0.057
}

85
trainer_log.jsonl Normal file
View File

@@ -0,0 +1,85 @@
{"current_steps": 10, "total_steps": 816, "loss": 1.033, "lr": 5e-06, "epoch": 0.03666361136571952, "percentage": 1.23, "elapsed_time": "0:02:55", "remaining_time": "3:55:13"}
{"current_steps": 20, "total_steps": 816, "loss": 0.9011, "lr": 5e-06, "epoch": 0.07332722273143905, "percentage": 2.45, "elapsed_time": "0:05:44", "remaining_time": "3:48:40"}
{"current_steps": 30, "total_steps": 816, "loss": 0.8764, "lr": 5e-06, "epoch": 0.10999083409715857, "percentage": 3.68, "elapsed_time": "0:08:33", "remaining_time": "3:44:10"}
{"current_steps": 40, "total_steps": 816, "loss": 0.8446, "lr": 5e-06, "epoch": 0.1466544454628781, "percentage": 4.9, "elapsed_time": "0:11:23", "remaining_time": "3:40:58"}
{"current_steps": 50, "total_steps": 816, "loss": 0.8204, "lr": 5e-06, "epoch": 0.18331805682859761, "percentage": 6.13, "elapsed_time": "0:14:12", "remaining_time": "3:37:42"}
{"current_steps": 60, "total_steps": 816, "loss": 0.8104, "lr": 5e-06, "epoch": 0.21998166819431714, "percentage": 7.35, "elapsed_time": "0:17:00", "remaining_time": "3:34:21"}
{"current_steps": 70, "total_steps": 816, "loss": 0.7968, "lr": 5e-06, "epoch": 0.2566452795600367, "percentage": 8.58, "elapsed_time": "0:19:49", "remaining_time": "3:31:11"}
{"current_steps": 80, "total_steps": 816, "loss": 0.7836, "lr": 5e-06, "epoch": 0.2933088909257562, "percentage": 9.8, "elapsed_time": "0:22:36", "remaining_time": "3:28:02"}
{"current_steps": 90, "total_steps": 816, "loss": 0.7781, "lr": 5e-06, "epoch": 0.32997250229147573, "percentage": 11.03, "elapsed_time": "0:25:24", "remaining_time": "3:24:59"}
{"current_steps": 100, "total_steps": 816, "loss": 0.7732, "lr": 5e-06, "epoch": 0.36663611365719523, "percentage": 12.25, "elapsed_time": "0:28:14", "remaining_time": "3:22:10"}
{"current_steps": 110, "total_steps": 816, "loss": 0.7711, "lr": 5e-06, "epoch": 0.4032997250229148, "percentage": 13.48, "elapsed_time": "0:31:01", "remaining_time": "3:19:05"}
{"current_steps": 120, "total_steps": 816, "loss": 0.7637, "lr": 5e-06, "epoch": 0.4399633363886343, "percentage": 14.71, "elapsed_time": "0:33:47", "remaining_time": "3:16:00"}
{"current_steps": 130, "total_steps": 816, "loss": 0.7617, "lr": 5e-06, "epoch": 0.4766269477543538, "percentage": 15.93, "elapsed_time": "0:36:34", "remaining_time": "3:12:59"}
{"current_steps": 140, "total_steps": 816, "loss": 0.7562, "lr": 5e-06, "epoch": 0.5132905591200734, "percentage": 17.16, "elapsed_time": "0:39:20", "remaining_time": "3:09:59"}
{"current_steps": 150, "total_steps": 816, "loss": 0.7601, "lr": 5e-06, "epoch": 0.5499541704857929, "percentage": 18.38, "elapsed_time": "0:42:09", "remaining_time": "3:07:09"}
{"current_steps": 160, "total_steps": 816, "loss": 0.7561, "lr": 5e-06, "epoch": 0.5866177818515124, "percentage": 19.61, "elapsed_time": "0:44:57", "remaining_time": "3:04:19"}
{"current_steps": 170, "total_steps": 816, "loss": 0.7549, "lr": 5e-06, "epoch": 0.6232813932172319, "percentage": 20.83, "elapsed_time": "0:47:46", "remaining_time": "3:01:31"}
{"current_steps": 180, "total_steps": 816, "loss": 0.7525, "lr": 5e-06, "epoch": 0.6599450045829515, "percentage": 22.06, "elapsed_time": "0:50:35", "remaining_time": "2:58:45"}
{"current_steps": 190, "total_steps": 816, "loss": 0.7507, "lr": 5e-06, "epoch": 0.696608615948671, "percentage": 23.28, "elapsed_time": "0:53:25", "remaining_time": "2:56:00"}
{"current_steps": 200, "total_steps": 816, "loss": 0.7492, "lr": 5e-06, "epoch": 0.7332722273143905, "percentage": 24.51, "elapsed_time": "0:56:14", "remaining_time": "2:53:13"}
{"current_steps": 210, "total_steps": 816, "loss": 0.7446, "lr": 5e-06, "epoch": 0.76993583868011, "percentage": 25.74, "elapsed_time": "0:59:01", "remaining_time": "2:50:20"}
{"current_steps": 220, "total_steps": 816, "loss": 0.7512, "lr": 5e-06, "epoch": 0.8065994500458296, "percentage": 26.96, "elapsed_time": "1:01:50", "remaining_time": "2:47:33"}
{"current_steps": 230, "total_steps": 816, "loss": 0.744, "lr": 5e-06, "epoch": 0.843263061411549, "percentage": 28.19, "elapsed_time": "1:04:39", "remaining_time": "2:44:43"}
{"current_steps": 240, "total_steps": 816, "loss": 0.7431, "lr": 5e-06, "epoch": 0.8799266727772685, "percentage": 29.41, "elapsed_time": "1:07:26", "remaining_time": "2:41:51"}
{"current_steps": 250, "total_steps": 816, "loss": 0.7399, "lr": 5e-06, "epoch": 0.916590284142988, "percentage": 30.64, "elapsed_time": "1:10:15", "remaining_time": "2:39:03"}
{"current_steps": 260, "total_steps": 816, "loss": 0.7457, "lr": 5e-06, "epoch": 0.9532538955087076, "percentage": 31.86, "elapsed_time": "1:13:02", "remaining_time": "2:36:12"}
{"current_steps": 270, "total_steps": 816, "loss": 0.7434, "lr": 5e-06, "epoch": 0.9899175068744271, "percentage": 33.09, "elapsed_time": "1:15:50", "remaining_time": "2:33:21"}
{"current_steps": 272, "total_steps": 816, "eval_loss": 0.743977963924408, "epoch": 0.997250229147571, "percentage": 33.33, "elapsed_time": "1:18:04", "remaining_time": "2:36:08"}
{"current_steps": 280, "total_steps": 816, "loss": 0.7593, "lr": 5e-06, "epoch": 1.0284142988084326, "percentage": 34.31, "elapsed_time": "1:21:12", "remaining_time": "2:35:26"}
{"current_steps": 290, "total_steps": 816, "loss": 0.6885, "lr": 5e-06, "epoch": 1.065077910174152, "percentage": 35.54, "elapsed_time": "1:23:58", "remaining_time": "2:32:18"}
{"current_steps": 300, "total_steps": 816, "loss": 0.6893, "lr": 5e-06, "epoch": 1.1017415215398716, "percentage": 36.76, "elapsed_time": "1:26:45", "remaining_time": "2:29:13"}
{"current_steps": 310, "total_steps": 816, "loss": 0.6868, "lr": 5e-06, "epoch": 1.138405132905591, "percentage": 37.99, "elapsed_time": "1:29:33", "remaining_time": "2:26:10"}
{"current_steps": 320, "total_steps": 816, "loss": 0.6885, "lr": 5e-06, "epoch": 1.1750687442713108, "percentage": 39.22, "elapsed_time": "1:32:21", "remaining_time": "2:23:09"}
{"current_steps": 330, "total_steps": 816, "loss": 0.6871, "lr": 5e-06, "epoch": 1.2117323556370303, "percentage": 40.44, "elapsed_time": "1:35:09", "remaining_time": "2:20:08"}
{"current_steps": 340, "total_steps": 816, "loss": 0.6935, "lr": 5e-06, "epoch": 1.2483959670027498, "percentage": 41.67, "elapsed_time": "1:37:58", "remaining_time": "2:17:09"}
{"current_steps": 350, "total_steps": 816, "loss": 0.6943, "lr": 5e-06, "epoch": 1.2850595783684693, "percentage": 42.89, "elapsed_time": "1:40:44", "remaining_time": "2:14:08"}
{"current_steps": 360, "total_steps": 816, "loss": 0.6938, "lr": 5e-06, "epoch": 1.3217231897341888, "percentage": 44.12, "elapsed_time": "1:43:31", "remaining_time": "2:11:07"}
{"current_steps": 370, "total_steps": 816, "loss": 0.6856, "lr": 5e-06, "epoch": 1.3583868010999083, "percentage": 45.34, "elapsed_time": "1:46:16", "remaining_time": "2:08:05"}
{"current_steps": 380, "total_steps": 816, "loss": 0.69, "lr": 5e-06, "epoch": 1.3950504124656278, "percentage": 46.57, "elapsed_time": "1:49:04", "remaining_time": "2:05:08"}
{"current_steps": 390, "total_steps": 816, "loss": 0.682, "lr": 5e-06, "epoch": 1.4317140238313475, "percentage": 47.79, "elapsed_time": "1:51:52", "remaining_time": "2:02:11"}
{"current_steps": 400, "total_steps": 816, "loss": 0.6917, "lr": 5e-06, "epoch": 1.468377635197067, "percentage": 49.02, "elapsed_time": "1:54:39", "remaining_time": "1:59:14"}
{"current_steps": 410, "total_steps": 816, "loss": 0.6871, "lr": 5e-06, "epoch": 1.5050412465627865, "percentage": 50.25, "elapsed_time": "1:57:26", "remaining_time": "1:56:17"}
{"current_steps": 420, "total_steps": 816, "loss": 0.6829, "lr": 5e-06, "epoch": 1.541704857928506, "percentage": 51.47, "elapsed_time": "2:00:12", "remaining_time": "1:53:20"}
{"current_steps": 430, "total_steps": 816, "loss": 0.6824, "lr": 5e-06, "epoch": 1.5783684692942255, "percentage": 52.7, "elapsed_time": "2:03:00", "remaining_time": "1:50:24"}
{"current_steps": 440, "total_steps": 816, "loss": 0.6822, "lr": 5e-06, "epoch": 1.615032080659945, "percentage": 53.92, "elapsed_time": "2:05:47", "remaining_time": "1:47:30"}
{"current_steps": 450, "total_steps": 816, "loss": 0.6879, "lr": 5e-06, "epoch": 1.6516956920256645, "percentage": 55.15, "elapsed_time": "2:08:35", "remaining_time": "1:44:35"}
{"current_steps": 460, "total_steps": 816, "loss": 0.6804, "lr": 5e-06, "epoch": 1.6883593033913842, "percentage": 56.37, "elapsed_time": "2:11:22", "remaining_time": "1:41:40"}
{"current_steps": 470, "total_steps": 816, "loss": 0.686, "lr": 5e-06, "epoch": 1.7250229147571035, "percentage": 57.6, "elapsed_time": "2:14:08", "remaining_time": "1:38:45"}
{"current_steps": 480, "total_steps": 816, "loss": 0.6856, "lr": 5e-06, "epoch": 1.7616865261228232, "percentage": 58.82, "elapsed_time": "2:16:55", "remaining_time": "1:35:51"}
{"current_steps": 490, "total_steps": 816, "loss": 0.6814, "lr": 5e-06, "epoch": 1.7983501374885427, "percentage": 60.05, "elapsed_time": "2:19:42", "remaining_time": "1:32:57"}
{"current_steps": 500, "total_steps": 816, "loss": 0.6855, "lr": 5e-06, "epoch": 1.8350137488542622, "percentage": 61.27, "elapsed_time": "2:22:31", "remaining_time": "1:30:04"}
{"current_steps": 510, "total_steps": 816, "loss": 0.6875, "lr": 5e-06, "epoch": 1.8716773602199817, "percentage": 62.5, "elapsed_time": "2:25:17", "remaining_time": "1:27:10"}
{"current_steps": 520, "total_steps": 816, "loss": 0.6856, "lr": 5e-06, "epoch": 1.9083409715857012, "percentage": 63.73, "elapsed_time": "2:28:05", "remaining_time": "1:24:18"}
{"current_steps": 530, "total_steps": 816, "loss": 0.6841, "lr": 5e-06, "epoch": 1.9450045829514209, "percentage": 64.95, "elapsed_time": "2:30:54", "remaining_time": "1:21:26"}
{"current_steps": 540, "total_steps": 816, "loss": 0.6845, "lr": 5e-06, "epoch": 1.9816681943171401, "percentage": 66.18, "elapsed_time": "2:33:43", "remaining_time": "1:18:34"}
{"current_steps": 544, "total_steps": 816, "eval_loss": 0.7305116057395935, "epoch": 1.996333638863428, "percentage": 66.67, "elapsed_time": "2:36:33", "remaining_time": "1:18:16"}
{"current_steps": 550, "total_steps": 816, "loss": 0.707, "lr": 5e-06, "epoch": 2.020164986251146, "percentage": 67.4, "elapsed_time": "2:39:11", "remaining_time": "1:16:59"}
{"current_steps": 560, "total_steps": 816, "loss": 0.6313, "lr": 5e-06, "epoch": 2.056828597616865, "percentage": 68.63, "elapsed_time": "2:41:58", "remaining_time": "1:14:02"}
{"current_steps": 570, "total_steps": 816, "loss": 0.6281, "lr": 5e-06, "epoch": 2.093492208982585, "percentage": 69.85, "elapsed_time": "2:44:46", "remaining_time": "1:11:06"}
{"current_steps": 580, "total_steps": 816, "loss": 0.631, "lr": 5e-06, "epoch": 2.130155820348304, "percentage": 71.08, "elapsed_time": "2:47:34", "remaining_time": "1:08:11"}
{"current_steps": 590, "total_steps": 816, "loss": 0.6351, "lr": 5e-06, "epoch": 2.166819431714024, "percentage": 72.3, "elapsed_time": "2:50:21", "remaining_time": "1:05:15"}
{"current_steps": 600, "total_steps": 816, "loss": 0.634, "lr": 5e-06, "epoch": 2.203483043079743, "percentage": 73.53, "elapsed_time": "2:53:07", "remaining_time": "1:02:19"}
{"current_steps": 610, "total_steps": 816, "loss": 0.6327, "lr": 5e-06, "epoch": 2.240146654445463, "percentage": 74.75, "elapsed_time": "2:55:53", "remaining_time": "0:59:24"}
{"current_steps": 620, "total_steps": 816, "loss": 0.6339, "lr": 5e-06, "epoch": 2.276810265811182, "percentage": 75.98, "elapsed_time": "2:58:40", "remaining_time": "0:56:28"}
{"current_steps": 630, "total_steps": 816, "loss": 0.636, "lr": 5e-06, "epoch": 2.313473877176902, "percentage": 77.21, "elapsed_time": "3:01:26", "remaining_time": "0:53:34"}
{"current_steps": 640, "total_steps": 816, "loss": 0.6342, "lr": 5e-06, "epoch": 2.3501374885426216, "percentage": 78.43, "elapsed_time": "3:04:13", "remaining_time": "0:50:39"}
{"current_steps": 650, "total_steps": 816, "loss": 0.637, "lr": 5e-06, "epoch": 2.386801099908341, "percentage": 79.66, "elapsed_time": "3:06:59", "remaining_time": "0:47:45"}
{"current_steps": 660, "total_steps": 816, "loss": 0.63, "lr": 5e-06, "epoch": 2.4234647112740606, "percentage": 80.88, "elapsed_time": "3:09:44", "remaining_time": "0:44:50"}
{"current_steps": 670, "total_steps": 816, "loss": 0.6288, "lr": 5e-06, "epoch": 2.46012832263978, "percentage": 82.11, "elapsed_time": "3:12:30", "remaining_time": "0:41:57"}
{"current_steps": 680, "total_steps": 816, "loss": 0.6381, "lr": 5e-06, "epoch": 2.4967919340054996, "percentage": 83.33, "elapsed_time": "3:15:19", "remaining_time": "0:39:03"}
{"current_steps": 690, "total_steps": 816, "loss": 0.6332, "lr": 5e-06, "epoch": 2.5334555453712193, "percentage": 84.56, "elapsed_time": "3:18:05", "remaining_time": "0:36:10"}
{"current_steps": 700, "total_steps": 816, "loss": 0.6351, "lr": 5e-06, "epoch": 2.5701191567369386, "percentage": 85.78, "elapsed_time": "3:20:53", "remaining_time": "0:33:17"}
{"current_steps": 710, "total_steps": 816, "loss": 0.6363, "lr": 5e-06, "epoch": 2.606782768102658, "percentage": 87.01, "elapsed_time": "3:23:40", "remaining_time": "0:30:24"}
{"current_steps": 720, "total_steps": 816, "loss": 0.6355, "lr": 5e-06, "epoch": 2.6434463794683776, "percentage": 88.24, "elapsed_time": "3:26:27", "remaining_time": "0:27:31"}
{"current_steps": 730, "total_steps": 816, "loss": 0.6351, "lr": 5e-06, "epoch": 2.6801099908340973, "percentage": 89.46, "elapsed_time": "3:29:12", "remaining_time": "0:24:38"}
{"current_steps": 740, "total_steps": 816, "loss": 0.638, "lr": 5e-06, "epoch": 2.7167736021998166, "percentage": 90.69, "elapsed_time": "3:31:58", "remaining_time": "0:21:46"}
{"current_steps": 750, "total_steps": 816, "loss": 0.6388, "lr": 5e-06, "epoch": 2.7534372135655363, "percentage": 91.91, "elapsed_time": "3:34:44", "remaining_time": "0:18:53"}
{"current_steps": 760, "total_steps": 816, "loss": 0.6364, "lr": 5e-06, "epoch": 2.7901008249312556, "percentage": 93.14, "elapsed_time": "3:37:32", "remaining_time": "0:16:01"}
{"current_steps": 770, "total_steps": 816, "loss": 0.6421, "lr": 5e-06, "epoch": 2.8267644362969753, "percentage": 94.36, "elapsed_time": "3:40:18", "remaining_time": "0:13:09"}
{"current_steps": 780, "total_steps": 816, "loss": 0.6379, "lr": 5e-06, "epoch": 2.863428047662695, "percentage": 95.59, "elapsed_time": "3:43:05", "remaining_time": "0:10:17"}
{"current_steps": 790, "total_steps": 816, "loss": 0.6414, "lr": 5e-06, "epoch": 2.9000916590284143, "percentage": 96.81, "elapsed_time": "3:45:51", "remaining_time": "0:07:25"}
{"current_steps": 800, "total_steps": 816, "loss": 0.6361, "lr": 5e-06, "epoch": 2.936755270394134, "percentage": 98.04, "elapsed_time": "3:48:38", "remaining_time": "0:04:34"}
{"current_steps": 810, "total_steps": 816, "loss": 0.6373, "lr": 5e-06, "epoch": 2.9734188817598532, "percentage": 99.26, "elapsed_time": "3:51:25", "remaining_time": "0:01:42"}
{"current_steps": 816, "total_steps": 816, "eval_loss": 0.7332214117050171, "epoch": 2.9954170485792853, "percentage": 100.0, "elapsed_time": "3:55:44", "remaining_time": "0:00:00"}
{"current_steps": 816, "total_steps": 816, "epoch": 2.9954170485792853, "percentage": 100.0, "elapsed_time": "3:56:59", "remaining_time": "0:00:00"}

633
trainer_state.json Normal file
View File

@@ -0,0 +1,633 @@
{
"best_metric": null,
"best_model_checkpoint": null,
"epoch": 2.9954170485792853,
"eval_steps": 500,
"global_step": 816,
"is_hyper_param_search": false,
"is_local_process_zero": true,
"is_world_process_zero": true,
"log_history": [
{
"epoch": 0.03666361136571952,
"grad_norm": 10.920966734668648,
"learning_rate": 5e-06,
"loss": 1.033,
"step": 10
},
{
"epoch": 0.07332722273143905,
"grad_norm": 2.445980198452297,
"learning_rate": 5e-06,
"loss": 0.9011,
"step": 20
},
{
"epoch": 0.10999083409715857,
"grad_norm": 1.5424232386159482,
"learning_rate": 5e-06,
"loss": 0.8764,
"step": 30
},
{
"epoch": 0.1466544454628781,
"grad_norm": 1.1797649838046136,
"learning_rate": 5e-06,
"loss": 0.8446,
"step": 40
},
{
"epoch": 0.18331805682859761,
"grad_norm": 1.0295589020655365,
"learning_rate": 5e-06,
"loss": 0.8204,
"step": 50
},
{
"epoch": 0.21998166819431714,
"grad_norm": 1.2160434357225554,
"learning_rate": 5e-06,
"loss": 0.8104,
"step": 60
},
{
"epoch": 0.2566452795600367,
"grad_norm": 1.2135493715768004,
"learning_rate": 5e-06,
"loss": 0.7968,
"step": 70
},
{
"epoch": 0.2933088909257562,
"grad_norm": 0.7594488712178804,
"learning_rate": 5e-06,
"loss": 0.7836,
"step": 80
},
{
"epoch": 0.32997250229147573,
"grad_norm": 0.8913076302913621,
"learning_rate": 5e-06,
"loss": 0.7781,
"step": 90
},
{
"epoch": 0.36663611365719523,
"grad_norm": 1.126183659145103,
"learning_rate": 5e-06,
"loss": 0.7732,
"step": 100
},
{
"epoch": 0.4032997250229148,
"grad_norm": 0.7476760341544976,
"learning_rate": 5e-06,
"loss": 0.7711,
"step": 110
},
{
"epoch": 0.4399633363886343,
"grad_norm": 0.828783632948725,
"learning_rate": 5e-06,
"loss": 0.7637,
"step": 120
},
{
"epoch": 0.4766269477543538,
"grad_norm": 0.7005369874659794,
"learning_rate": 5e-06,
"loss": 0.7617,
"step": 130
},
{
"epoch": 0.5132905591200734,
"grad_norm": 0.6781356553576761,
"learning_rate": 5e-06,
"loss": 0.7562,
"step": 140
},
{
"epoch": 0.5499541704857929,
"grad_norm": 0.6643060954517749,
"learning_rate": 5e-06,
"loss": 0.7601,
"step": 150
},
{
"epoch": 0.5866177818515124,
"grad_norm": 0.654862572470797,
"learning_rate": 5e-06,
"loss": 0.7561,
"step": 160
},
{
"epoch": 0.6232813932172319,
"grad_norm": 0.7126834121476828,
"learning_rate": 5e-06,
"loss": 0.7549,
"step": 170
},
{
"epoch": 0.6599450045829515,
"grad_norm": 0.5845932549413623,
"learning_rate": 5e-06,
"loss": 0.7525,
"step": 180
},
{
"epoch": 0.696608615948671,
"grad_norm": 0.583642927450063,
"learning_rate": 5e-06,
"loss": 0.7507,
"step": 190
},
{
"epoch": 0.7332722273143905,
"grad_norm": 0.5759630428428489,
"learning_rate": 5e-06,
"loss": 0.7492,
"step": 200
},
{
"epoch": 0.76993583868011,
"grad_norm": 0.597809207757354,
"learning_rate": 5e-06,
"loss": 0.7446,
"step": 210
},
{
"epoch": 0.8065994500458296,
"grad_norm": 0.6520665055230834,
"learning_rate": 5e-06,
"loss": 0.7512,
"step": 220
},
{
"epoch": 0.843263061411549,
"grad_norm": 0.6521761800994458,
"learning_rate": 5e-06,
"loss": 0.744,
"step": 230
},
{
"epoch": 0.8799266727772685,
"grad_norm": 0.6083361886529014,
"learning_rate": 5e-06,
"loss": 0.7431,
"step": 240
},
{
"epoch": 0.916590284142988,
"grad_norm": 0.8966782629847545,
"learning_rate": 5e-06,
"loss": 0.7399,
"step": 250
},
{
"epoch": 0.9532538955087076,
"grad_norm": 0.6584181334872885,
"learning_rate": 5e-06,
"loss": 0.7457,
"step": 260
},
{
"epoch": 0.9899175068744271,
"grad_norm": 0.5614900416740534,
"learning_rate": 5e-06,
"loss": 0.7434,
"step": 270
},
{
"epoch": 0.997250229147571,
"eval_loss": 0.743977963924408,
"eval_runtime": 96.6447,
"eval_samples_per_second": 76.052,
"eval_steps_per_second": 0.6,
"step": 272
},
{
"epoch": 1.0284142988084326,
"grad_norm": 0.6377526876986616,
"learning_rate": 5e-06,
"loss": 0.7593,
"step": 280
},
{
"epoch": 1.065077910174152,
"grad_norm": 0.8312923337011684,
"learning_rate": 5e-06,
"loss": 0.6885,
"step": 290
},
{
"epoch": 1.1017415215398716,
"grad_norm": 0.6499984381614756,
"learning_rate": 5e-06,
"loss": 0.6893,
"step": 300
},
{
"epoch": 1.138405132905591,
"grad_norm": 0.658519279927457,
"learning_rate": 5e-06,
"loss": 0.6868,
"step": 310
},
{
"epoch": 1.1750687442713108,
"grad_norm": 0.6307182099292118,
"learning_rate": 5e-06,
"loss": 0.6885,
"step": 320
},
{
"epoch": 1.2117323556370303,
"grad_norm": 0.6191143311988347,
"learning_rate": 5e-06,
"loss": 0.6871,
"step": 330
},
{
"epoch": 1.2483959670027498,
"grad_norm": 0.6735946598593434,
"learning_rate": 5e-06,
"loss": 0.6935,
"step": 340
},
{
"epoch": 1.2850595783684693,
"grad_norm": 0.7213451984916242,
"learning_rate": 5e-06,
"loss": 0.6943,
"step": 350
},
{
"epoch": 1.3217231897341888,
"grad_norm": 0.5841901070016948,
"learning_rate": 5e-06,
"loss": 0.6938,
"step": 360
},
{
"epoch": 1.3583868010999083,
"grad_norm": 0.6609752377099979,
"learning_rate": 5e-06,
"loss": 0.6856,
"step": 370
},
{
"epoch": 1.3950504124656278,
"grad_norm": 0.6004672142282963,
"learning_rate": 5e-06,
"loss": 0.69,
"step": 380
},
{
"epoch": 1.4317140238313475,
"grad_norm": 0.7494020947088555,
"learning_rate": 5e-06,
"loss": 0.682,
"step": 390
},
{
"epoch": 1.468377635197067,
"grad_norm": 0.6711006066177567,
"learning_rate": 5e-06,
"loss": 0.6917,
"step": 400
},
{
"epoch": 1.5050412465627865,
"grad_norm": 0.6517430215570676,
"learning_rate": 5e-06,
"loss": 0.6871,
"step": 410
},
{
"epoch": 1.541704857928506,
"grad_norm": 0.6180564693914907,
"learning_rate": 5e-06,
"loss": 0.6829,
"step": 420
},
{
"epoch": 1.5783684692942255,
"grad_norm": 0.5764324092377354,
"learning_rate": 5e-06,
"loss": 0.6824,
"step": 430
},
{
"epoch": 1.615032080659945,
"grad_norm": 0.7134204082562298,
"learning_rate": 5e-06,
"loss": 0.6822,
"step": 440
},
{
"epoch": 1.6516956920256645,
"grad_norm": 0.7630512170385407,
"learning_rate": 5e-06,
"loss": 0.6879,
"step": 450
},
{
"epoch": 1.6883593033913842,
"grad_norm": 0.6285437172539765,
"learning_rate": 5e-06,
"loss": 0.6804,
"step": 460
},
{
"epoch": 1.7250229147571035,
"grad_norm": 0.5968789313484854,
"learning_rate": 5e-06,
"loss": 0.686,
"step": 470
},
{
"epoch": 1.7616865261228232,
"grad_norm": 0.6425175740435289,
"learning_rate": 5e-06,
"loss": 0.6856,
"step": 480
},
{
"epoch": 1.7983501374885427,
"grad_norm": 0.7614365266625939,
"learning_rate": 5e-06,
"loss": 0.6814,
"step": 490
},
{
"epoch": 1.8350137488542622,
"grad_norm": 0.5496379357416068,
"learning_rate": 5e-06,
"loss": 0.6855,
"step": 500
},
{
"epoch": 1.8716773602199817,
"grad_norm": 0.8494093367270151,
"learning_rate": 5e-06,
"loss": 0.6875,
"step": 510
},
{
"epoch": 1.9083409715857012,
"grad_norm": 0.6756166103000668,
"learning_rate": 5e-06,
"loss": 0.6856,
"step": 520
},
{
"epoch": 1.9450045829514209,
"grad_norm": 0.7228484772895967,
"learning_rate": 5e-06,
"loss": 0.6841,
"step": 530
},
{
"epoch": 1.9816681943171401,
"grad_norm": 0.7786774729146112,
"learning_rate": 5e-06,
"loss": 0.6845,
"step": 540
},
{
"epoch": 1.996333638863428,
"eval_loss": 0.7305116057395935,
"eval_runtime": 96.1532,
"eval_samples_per_second": 76.441,
"eval_steps_per_second": 0.603,
"step": 544
},
{
"epoch": 2.020164986251146,
"grad_norm": 1.0425759476709966,
"learning_rate": 5e-06,
"loss": 0.707,
"step": 550
},
{
"epoch": 2.056828597616865,
"grad_norm": 0.8473344095764829,
"learning_rate": 5e-06,
"loss": 0.6313,
"step": 560
},
{
"epoch": 2.093492208982585,
"grad_norm": 0.7205628261028438,
"learning_rate": 5e-06,
"loss": 0.6281,
"step": 570
},
{
"epoch": 2.130155820348304,
"grad_norm": 0.6604987014823058,
"learning_rate": 5e-06,
"loss": 0.631,
"step": 580
},
{
"epoch": 2.166819431714024,
"grad_norm": 0.6774961015973217,
"learning_rate": 5e-06,
"loss": 0.6351,
"step": 590
},
{
"epoch": 2.203483043079743,
"grad_norm": 0.8519809292040578,
"learning_rate": 5e-06,
"loss": 0.634,
"step": 600
},
{
"epoch": 2.240146654445463,
"grad_norm": 0.693823740633704,
"learning_rate": 5e-06,
"loss": 0.6327,
"step": 610
},
{
"epoch": 2.276810265811182,
"grad_norm": 0.6448705487045298,
"learning_rate": 5e-06,
"loss": 0.6339,
"step": 620
},
{
"epoch": 2.313473877176902,
"grad_norm": 0.5865817788059118,
"learning_rate": 5e-06,
"loss": 0.636,
"step": 630
},
{
"epoch": 2.3501374885426216,
"grad_norm": 0.8116556137845999,
"learning_rate": 5e-06,
"loss": 0.6342,
"step": 640
},
{
"epoch": 2.386801099908341,
"grad_norm": 0.6231657257473445,
"learning_rate": 5e-06,
"loss": 0.637,
"step": 650
},
{
"epoch": 2.4234647112740606,
"grad_norm": 0.6250913266909794,
"learning_rate": 5e-06,
"loss": 0.63,
"step": 660
},
{
"epoch": 2.46012832263978,
"grad_norm": 0.582068921531117,
"learning_rate": 5e-06,
"loss": 0.6288,
"step": 670
},
{
"epoch": 2.4967919340054996,
"grad_norm": 0.6912367969819871,
"learning_rate": 5e-06,
"loss": 0.6381,
"step": 680
},
{
"epoch": 2.5334555453712193,
"grad_norm": 0.7147652107920064,
"learning_rate": 5e-06,
"loss": 0.6332,
"step": 690
},
{
"epoch": 2.5701191567369386,
"grad_norm": 0.5792260811836798,
"learning_rate": 5e-06,
"loss": 0.6351,
"step": 700
},
{
"epoch": 2.606782768102658,
"grad_norm": 0.7963438662743851,
"learning_rate": 5e-06,
"loss": 0.6363,
"step": 710
},
{
"epoch": 2.6434463794683776,
"grad_norm": 0.9276380358330181,
"learning_rate": 5e-06,
"loss": 0.6355,
"step": 720
},
{
"epoch": 2.6801099908340973,
"grad_norm": 0.9313823270809661,
"learning_rate": 5e-06,
"loss": 0.6351,
"step": 730
},
{
"epoch": 2.7167736021998166,
"grad_norm": 0.7304200587600748,
"learning_rate": 5e-06,
"loss": 0.638,
"step": 740
},
{
"epoch": 2.7534372135655363,
"grad_norm": 0.6212966397528322,
"learning_rate": 5e-06,
"loss": 0.6388,
"step": 750
},
{
"epoch": 2.7901008249312556,
"grad_norm": 0.6720686482466423,
"learning_rate": 5e-06,
"loss": 0.6364,
"step": 760
},
{
"epoch": 2.8267644362969753,
"grad_norm": 0.6438467896193539,
"learning_rate": 5e-06,
"loss": 0.6421,
"step": 770
},
{
"epoch": 2.863428047662695,
"grad_norm": 0.6043416931907646,
"learning_rate": 5e-06,
"loss": 0.6379,
"step": 780
},
{
"epoch": 2.9000916590284143,
"grad_norm": 0.6496494693588303,
"learning_rate": 5e-06,
"loss": 0.6414,
"step": 790
},
{
"epoch": 2.936755270394134,
"grad_norm": 0.8144443719589332,
"learning_rate": 5e-06,
"loss": 0.6361,
"step": 800
},
{
"epoch": 2.9734188817598532,
"grad_norm": 0.7037764123768507,
"learning_rate": 5e-06,
"loss": 0.6373,
"step": 810
},
{
"epoch": 2.9954170485792853,
"eval_loss": 0.7332214117050171,
"eval_runtime": 94.4197,
"eval_samples_per_second": 77.844,
"eval_steps_per_second": 0.614,
"step": 816
},
{
"epoch": 2.9954170485792853,
"step": 816,
"total_flos": 1366411632967680.0,
"train_loss": 0.7035682309491962,
"train_runtime": 14220.3782,
"train_samples_per_second": 29.46,
"train_steps_per_second": 0.057
}
],
"logging_steps": 10,
"max_steps": 816,
"num_input_tokens_seen": 0,
"num_train_epochs": 3,
"save_steps": 500,
"stateful_callbacks": {
"TrainerControl": {
"args": {
"should_epoch_stop": false,
"should_evaluate": false,
"should_log": false,
"should_save": true,
"should_training_stop": true
},
"attributes": {}
}
},
"total_flos": 1366411632967680.0,
"train_batch_size": 8,
"trial_name": null,
"trial_params": null
}

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8baf9eae949940237851325d5f9dbadd1a80ca1a6f07f4de82e70f739b749ef0
size 7160

BIN
training_eval_loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

BIN
training_loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB