初始化项目,由ModelHub XC社区提供模型

Model: jackf857/qwen3-8b-base-orpo-ultrafeedback-4xh200-batch-128
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-31 14:11:53 +08:00
commit 801d57b5a2
22 changed files with 153267 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

79
README.md Normal file
View File

@@ -0,0 +1,79 @@
---
library_name: transformers
base_model: jackf857/qwen3-8b-base-sft-ultrachat-4xh200-batch-128
tags:
- alignment-handbook
- orpo
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: qwen3-8b-base-orpo-ultrafeedback-4xh200-batch-128
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# qwen3-8b-base-orpo-ultrafeedback-4xh200-batch-128
This model is a fine-tuned version of [jackf857/qwen3-8b-base-sft-ultrachat-4xh200-batch-128](https://huggingface.co/jackf857/qwen3-8b-base-sft-ultrachat-4xh200-batch-128) on the HuggingFaceH4/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 1.0784
- Rewards/chosen: -0.0085
- Rewards/rejected: -0.0105
- Rewards/accuracies: 0.6060
- Rewards/margins: 0.0020
- Logps/rejected: -1.0496
- Logps/chosen: -0.8523
- Logits/rejected: 2.1915
- Logits/chosen: 2.1569
- Nll Loss: 1.1100
- Log Odds Ratio: -0.6630
- Log Odds Chosen: 0.3031
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 128
- total_eval_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
| 8.6911 | 0.4188 | 200 | 1.0935 | -0.0086 | -0.0106 | 0.6100 | 0.0020 | -1.0645 | -0.8642 | 2.0572 | 2.0427 | 1.1233 | -0.6611 | 0.3058 |
| 8.6763 | 0.8377 | 400 | 1.0784 | -0.0085 | -0.0105 | 0.6060 | 0.0020 | -1.0496 | -0.8523 | 2.1915 | 2.1569 | 1.1100 | -0.6630 | 0.3031 |
### Framework versions
- Transformers 4.51.0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.21.4

28
added_tokens.json Normal file
View File

@@ -0,0 +1,28 @@
{
"</think>": 151668,
"</tool_call>": 151658,
"</tool_response>": 151666,
"<think>": 151667,
"<tool_call>": 151657,
"<tool_response>": 151665,
"<|box_end|>": 151649,
"<|box_start|>": 151648,
"<|endoftext|>": 151643,
"<|file_sep|>": 151664,
"<|fim_middle|>": 151660,
"<|fim_pad|>": 151662,
"<|fim_prefix|>": 151659,
"<|fim_suffix|>": 151661,
"<|im_end|>": 151645,
"<|im_start|>": 151644,
"<|image_pad|>": 151655,
"<|object_ref_end|>": 151647,
"<|object_ref_start|>": 151646,
"<|quad_end|>": 151651,
"<|quad_start|>": 151650,
"<|repo_name|>": 151663,
"<|video_pad|>": 151656,
"<|vision_end|>": 151653,
"<|vision_pad|>": 151654,
"<|vision_start|>": 151652
}

25
all_results.json Normal file
View File

@@ -0,0 +1,25 @@
{
"epoch": 0.9989528795811519,
"eval_log_odds_chosen": 0.30259791016578674,
"eval_log_odds_ratio": -0.6631017923355103,
"eval_logits/chosen": 2.102327585220337,
"eval_logits/rejected": 2.1291346549987793,
"eval_logps/chosen": -0.8522344827651978,
"eval_logps/rejected": -1.0493180751800537,
"eval_loss": 1.078126072883606,
"eval_nll_loss": 1.1097657680511475,
"eval_rewards/accuracies": 0.6039999723434448,
"eval_rewards/chosen": -0.008522345684468746,
"eval_rewards/margins": 0.0019708359614014626,
"eval_rewards/rejected": -0.010493181645870209,
"eval_runtime": 45.1175,
"eval_samples": 2000,
"eval_samples_per_second": 44.329,
"eval_steps_per_second": 5.541,
"total_flos": 0.0,
"train_loss": 8.957356926780077,
"train_runtime": 5488.1377,
"train_samples": 61135,
"train_samples_per_second": 11.139,
"train_steps_per_second": 0.087
}

30
config.json Normal file
View File

@@ -0,0 +1,30 @@
{
"architectures": [
"Qwen3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151643,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 12288,
"max_position_embeddings": 32768,
"max_window_layers": 36,
"model_type": "qwen3",
"num_attention_heads": 32,
"num_hidden_layers": 36,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.51.0",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}

19
eval_results.json Normal file
View File

@@ -0,0 +1,19 @@
{
"epoch": 0.9989528795811519,
"eval_log_odds_chosen": 0.30259791016578674,
"eval_log_odds_ratio": -0.6631017923355103,
"eval_logits/chosen": 2.102327585220337,
"eval_logits/rejected": 2.1291346549987793,
"eval_logps/chosen": -0.8522344827651978,
"eval_logps/rejected": -1.0493180751800537,
"eval_loss": 1.078126072883606,
"eval_nll_loss": 1.1097657680511475,
"eval_rewards/accuracies": 0.6039999723434448,
"eval_rewards/chosen": -0.008522345684468746,
"eval_rewards/margins": 0.0019708359614014626,
"eval_rewards/rejected": -0.010493181645870209,
"eval_runtime": 45.1175,
"eval_samples": 2000,
"eval_samples_per_second": 44.329,
"eval_steps_per_second": 5.541
}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"bos_token_id": 151643,
"eos_token_id": 151643,
"max_new_tokens": 2048,
"transformers_version": "4.51.0"
}

151388
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3b34ce7e8d5bddbb623298f21edf9a77e88de0914698a83e2a8d405b8ba6eaae
size 4972454376

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d7d1a45633d6ee74bf8b4cd8a4db940fdba477973f16a9d30024ef8302f6652a
size 4832048608

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5710fb92dca332274c400cc98d5c83756f37ac0c486e1fb3eed6fdf786832d6c
size 4832048656

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:877563ca53becff9e66fa54c4fa217516ff3ff227f00e5dafc02618f829c7a0d
size 4999855528

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f684cec998a1560203ad3e24571eeea2bb7a5754125afddd438319a6c08360d1
size 4832048672

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0cf325503c7e50313ac00b54ba1679e3ebc2faf81ccff481c846d4b1c11c744b
size 4832048672

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bb7f1699b477cacd5b779ae7e7fe0f32447a7751431cffe4e88135fad8623492
size 3462482728

View File

@@ -0,0 +1,406 @@
{
"metadata": {
"total_size": 32762941440
},
"weight_map": {
"lm_head.weight": "model-00007-of-00007.safetensors",
"model.embed_tokens.weight": "model-00001-of-00007.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.15.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.20.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.22.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.28.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.input_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.34.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.35.input_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.k_norm.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.q_norm.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.9.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.norm.weight": "model-00007-of-00007.safetensors"
}
}

31
special_tokens_map.json Normal file
View File

@@ -0,0 +1,31 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

BIN
tokenizer.json (Stored with Git LFS) Normal file

Binary file not shown.

240
tokenizer_config.json Normal file
View File

@@ -0,0 +1,240 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151665": {
"content": "<tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151666": {
"content": "</tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151667": {
"content": "<think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151668": {
"content": "</think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in message.content %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "<|endoftext|>",
"errors": "replace",
"extra_special_tokens": {},
"model_max_length": 2048,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

9
train_results.json Normal file
View File

@@ -0,0 +1,9 @@
{
"epoch": 0.9989528795811519,
"total_flos": 0.0,
"train_loss": 8.957356926780077,
"train_runtime": 5488.1377,
"train_samples": 61135,
"train_samples_per_second": 11.139,
"train_steps_per_second": 0.087
}

945
trainer_state.json Normal file
View File

@@ -0,0 +1,945 @@
{
"best_global_step": null,
"best_metric": null,
"best_model_checkpoint": null,
"epoch": 0.9989528795811519,
"eval_steps": 200,
"global_step": 477,
"is_hyper_param_search": false,
"is_local_process_zero": true,
"is_world_process_zero": true,
"log_history": [
{
"epoch": 0.0020942408376963353,
"grad_norm": 37.46305465698242,
"learning_rate": 0.0,
"log_odds_chosen": 0.35378384590148926,
"log_odds_ratio": -0.6519296765327454,
"logits/chosen": 2.203179359436035,
"logits/rejected": 2.035616397857666,
"logps/chosen": -1.1535288095474243,
"logps/rejected": -1.4391145706176758,
"loss": 10.2211,
"nll_loss": 1.4494060277938843,
"rewards/accuracies": 0.6875,
"rewards/chosen": -0.011535286903381348,
"rewards/margins": 0.002855856902897358,
"rewards/rejected": -0.014391143806278706,
"step": 1
},
{
"epoch": 0.020942408376963352,
"grad_norm": 37.87759780883789,
"learning_rate": 9.375e-08,
"log_odds_chosen": 0.30660638213157654,
"log_odds_ratio": -0.662986159324646,
"logits/chosen": 1.9456572532653809,
"logits/rejected": 1.8670408725738525,
"logps/chosen": -1.1083024740219116,
"logps/rejected": -1.3244930505752563,
"loss": 10.1264,
"nll_loss": 1.2528527975082397,
"rewards/accuracies": 0.6076388955116272,
"rewards/chosen": -0.011083023622632027,
"rewards/margins": 0.002161906799301505,
"rewards/rejected": -0.013244930654764175,
"step": 10
},
{
"epoch": 0.041884816753926704,
"grad_norm": 40.62479782104492,
"learning_rate": 1.9791666666666664e-07,
"log_odds_chosen": 0.26383697986602783,
"log_odds_ratio": -0.6774462461471558,
"logits/chosen": 1.8936617374420166,
"logits/rejected": 1.8155641555786133,
"logps/chosen": -1.129002332687378,
"logps/rejected": -1.3111597299575806,
"loss": 9.8951,
"nll_loss": 1.2187750339508057,
"rewards/accuracies": 0.574999988079071,
"rewards/chosen": -0.011290023103356361,
"rewards/margins": 0.001821571378968656,
"rewards/rejected": -0.013111594133079052,
"step": 20
},
{
"epoch": 0.06282722513089005,
"grad_norm": 36.72233963012695,
"learning_rate": 3.020833333333333e-07,
"log_odds_chosen": 0.15129676461219788,
"log_odds_ratio": -0.7099167704582214,
"logits/chosen": 1.9489176273345947,
"logits/rejected": 1.9070332050323486,
"logps/chosen": -1.0984728336334229,
"logps/rejected": -1.2049810886383057,
"loss": 10.154,
"nll_loss": 1.2440111637115479,
"rewards/accuracies": 0.578125,
"rewards/chosen": -0.010984729044139385,
"rewards/margins": 0.0010650831973180175,
"rewards/rejected": -0.012049810960888863,
"step": 30
},
{
"epoch": 0.08376963350785341,
"grad_norm": 31.673852920532227,
"learning_rate": 4.0625e-07,
"log_odds_chosen": 0.2637297511100769,
"log_odds_ratio": -0.6847748160362244,
"logits/chosen": 1.778116226196289,
"logits/rejected": 1.79119873046875,
"logps/chosen": -1.0361906290054321,
"logps/rejected": -1.2129265069961548,
"loss": 9.5835,
"nll_loss": 1.1749727725982666,
"rewards/accuracies": 0.590624988079071,
"rewards/chosen": -0.010361905209720135,
"rewards/margins": 0.0017673596739768982,
"rewards/rejected": -0.012129265815019608,
"step": 40
},
{
"epoch": 0.10471204188481675,
"grad_norm": 16.47711753845215,
"learning_rate": 4.999932966293553e-07,
"log_odds_chosen": 0.3281434178352356,
"log_odds_ratio": -0.6789900064468384,
"logits/chosen": 1.996852159500122,
"logits/rejected": 2.0322041511535645,
"logps/chosen": -0.9071202278137207,
"logps/rejected": -1.1231104135513306,
"loss": 9.6245,
"nll_loss": 1.200535535812378,
"rewards/accuracies": 0.581250011920929,
"rewards/chosen": -0.009071202017366886,
"rewards/margins": 0.0021599007304757833,
"rewards/rejected": -0.011231102980673313,
"step": 50
},
{
"epoch": 0.1256544502617801,
"grad_norm": 17.272981643676758,
"learning_rate": 4.991893270335525e-07,
"log_odds_chosen": 0.2221679985523224,
"log_odds_ratio": -0.7138159275054932,
"logits/chosen": 1.8936437368392944,
"logits/rejected": 1.8823055028915405,
"logps/chosen": -0.9890943765640259,
"logps/rejected": -1.1324011087417603,
"loss": 9.5413,
"nll_loss": 1.162929892539978,
"rewards/accuracies": 0.5406249761581421,
"rewards/chosen": -0.009890943765640259,
"rewards/margins": 0.0014330670237541199,
"rewards/rejected": -0.011324010789394379,
"step": 60
},
{
"epoch": 0.14659685863874344,
"grad_norm": 12.330283164978027,
"learning_rate": 4.970496218214204e-07,
"log_odds_chosen": 0.27448800206184387,
"log_odds_ratio": -0.697861909866333,
"logits/chosen": 2.0014004707336426,
"logits/rejected": 2.046846866607666,
"logps/chosen": -0.961329460144043,
"logps/rejected": -1.1528202295303345,
"loss": 9.2376,
"nll_loss": 1.1857097148895264,
"rewards/accuracies": 0.578125,
"rewards/chosen": -0.009613295085728168,
"rewards/margins": 0.0019149081781506538,
"rewards/rejected": -0.011528202332556248,
"step": 70
},
{
"epoch": 0.16753926701570682,
"grad_norm": 11.02938175201416,
"learning_rate": 4.935856505068998e-07,
"log_odds_chosen": 0.31524786353111267,
"log_odds_ratio": -0.6606825590133667,
"logits/chosen": 1.8413927555084229,
"logits/rejected": 1.8493198156356812,
"logps/chosen": -0.9049292802810669,
"logps/rejected": -1.0926640033721924,
"loss": 8.9813,
"nll_loss": 1.1445410251617432,
"rewards/accuracies": 0.6312500238418579,
"rewards/chosen": -0.009049292653799057,
"rewards/margins": 0.0018773479387164116,
"rewards/rejected": -0.010926639661192894,
"step": 80
},
{
"epoch": 0.18848167539267016,
"grad_norm": 9.860730171203613,
"learning_rate": 4.8881598109976e-07,
"log_odds_chosen": 0.3749118447303772,
"log_odds_ratio": -0.6490113139152527,
"logits/chosen": 1.7958896160125732,
"logits/rejected": 1.7599939107894897,
"logps/chosen": -0.8722783923149109,
"logps/rejected": -1.1078213453292847,
"loss": 9.2508,
"nll_loss": 1.0980162620544434,
"rewards/accuracies": 0.6312500238418579,
"rewards/chosen": -0.008722783997654915,
"rewards/margins": 0.0023554288782179356,
"rewards/rejected": -0.011078213341534138,
"step": 90
},
{
"epoch": 0.2094240837696335,
"grad_norm": 10.169231414794922,
"learning_rate": 4.827661805750437e-07,
"log_odds_chosen": 0.32181161642074585,
"log_odds_ratio": -0.6511259078979492,
"logits/chosen": 1.7957969903945923,
"logits/rejected": 1.779897689819336,
"logps/chosen": -0.8846995234489441,
"logps/rejected": -1.0705549716949463,
"loss": 9.0284,
"nll_loss": 1.08339524269104,
"rewards/accuracies": 0.6312500238418579,
"rewards/chosen": -0.008846994489431381,
"rewards/margins": 0.0018585551297292113,
"rewards/rejected": -0.010705549269914627,
"step": 100
},
{
"epoch": 0.23036649214659685,
"grad_norm": 11.416997909545898,
"learning_rate": 4.75468677825789e-07,
"log_odds_chosen": 0.4410727918148041,
"log_odds_ratio": -0.6154537796974182,
"logits/chosen": 1.8986194133758545,
"logits/rejected": 1.9162569046020508,
"logps/chosen": -0.8505121469497681,
"logps/rejected": -1.1303095817565918,
"loss": 9.0025,
"nll_loss": 1.1048234701156616,
"rewards/accuracies": 0.659375011920929,
"rewards/chosen": -0.008505119942128658,
"rewards/margins": 0.002797975903376937,
"rewards/rejected": -0.011303097009658813,
"step": 110
},
{
"epoch": 0.2513089005235602,
"grad_norm": 8.807473182678223,
"learning_rate": 4.669625898336438e-07,
"log_odds_chosen": 0.2529251277446747,
"log_odds_ratio": -0.6907952427864075,
"logits/chosen": 1.961059808731079,
"logits/rejected": 1.9387576580047607,
"logps/chosen": -0.9018535614013672,
"logps/rejected": -1.0625946521759033,
"loss": 8.9547,
"nll_loss": 1.0763670206069946,
"rewards/accuracies": 0.5531250238418579,
"rewards/chosen": -0.009018534794449806,
"rewards/margins": 0.0016074117738753557,
"rewards/rejected": -0.010625948198139668,
"step": 120
},
{
"epoch": 0.27225130890052357,
"grad_norm": 8.736103057861328,
"learning_rate": 4.5729351198915705e-07,
"log_odds_chosen": 0.34425991773605347,
"log_odds_ratio": -0.6504599452018738,
"logits/chosen": 1.858128309249878,
"logits/rejected": 1.95168936252594,
"logps/chosen": -0.8997282981872559,
"logps/rejected": -1.0928587913513184,
"loss": 9.0819,
"nll_loss": 1.0748308897018433,
"rewards/accuracies": 0.6312500238418579,
"rewards/chosen": -0.00899728387594223,
"rewards/margins": 0.0019313046941533685,
"rewards/rejected": -0.01092858798801899,
"step": 130
},
{
"epoch": 0.2931937172774869,
"grad_norm": 8.833135604858398,
"learning_rate": 4.4651327368569684e-07,
"log_odds_chosen": 0.334553062915802,
"log_odds_ratio": -0.6699846982955933,
"logits/chosen": 1.8222471475601196,
"logits/rejected": 1.8125450611114502,
"logps/chosen": -0.899248480796814,
"logps/rejected": -1.1075996160507202,
"loss": 9.0727,
"nll_loss": 1.161645531654358,
"rewards/accuracies": 0.59375,
"rewards/chosen": -0.008992486633360386,
"rewards/margins": 0.002083510160446167,
"rewards/rejected": -0.011075995862483978,
"step": 140
},
{
"epoch": 0.31413612565445026,
"grad_norm": 8.58157730102539,
"learning_rate": 4.346796604970912e-07,
"log_odds_chosen": 0.3596678674221039,
"log_odds_ratio": -0.6549097299575806,
"logits/chosen": 2.0077967643737793,
"logits/rejected": 1.9518957138061523,
"logps/chosen": -0.8897331357002258,
"logps/rejected": -1.1077500581741333,
"loss": 9.0157,
"nll_loss": 1.1182167530059814,
"rewards/accuracies": 0.612500011920929,
"rewards/chosen": -0.008897329680621624,
"rewards/margins": 0.0021801693364977837,
"rewards/rejected": -0.011077499017119408,
"step": 150
},
{
"epoch": 0.33507853403141363,
"grad_norm": 7.509969711303711,
"learning_rate": 4.218561044282098e-07,
"log_odds_chosen": 0.37523385882377625,
"log_odds_ratio": -0.639797031879425,
"logits/chosen": 1.9479191303253174,
"logits/rejected": 1.9333696365356445,
"logps/chosen": -0.8889120221138,
"logps/rejected": -1.131272554397583,
"loss": 9.0784,
"nll_loss": 1.1669073104858398,
"rewards/accuracies": 0.6343749761581421,
"rewards/chosen": -0.008889119140803814,
"rewards/margins": 0.002423606114462018,
"rewards/rejected": -0.011312725953757763,
"step": 160
},
{
"epoch": 0.35602094240837695,
"grad_norm": 10.229013442993164,
"learning_rate": 4.081113438988443e-07,
"log_odds_chosen": 0.25382956862449646,
"log_odds_ratio": -0.6958078145980835,
"logits/chosen": 1.9296739101409912,
"logits/rejected": 1.8618618249893188,
"logps/chosen": -0.870949923992157,
"logps/rejected": -1.0173413753509521,
"loss": 8.9875,
"nll_loss": 1.104835867881775,
"rewards/accuracies": 0.5843750238418579,
"rewards/chosen": -0.008709498681128025,
"rewards/margins": 0.0014639139408245683,
"rewards/rejected": -0.010173412971198559,
"step": 170
},
{
"epoch": 0.3769633507853403,
"grad_norm": 9.28216552734375,
"learning_rate": 3.935190552834828e-07,
"log_odds_chosen": 0.28805920481681824,
"log_odds_ratio": -0.6893592476844788,
"logits/chosen": 1.9231271743774414,
"logits/rejected": 1.826939344406128,
"logps/chosen": -0.8867882490158081,
"logps/rejected": -1.0307583808898926,
"loss": 8.7846,
"nll_loss": 1.1256717443466187,
"rewards/accuracies": 0.606249988079071,
"rewards/chosen": -0.008867883123457432,
"rewards/margins": 0.0014397003687918186,
"rewards/rejected": -0.010307582095265388,
"step": 180
},
{
"epoch": 0.39790575916230364,
"grad_norm": 8.50275993347168,
"learning_rate": 3.781574579820464e-07,
"log_odds_chosen": 0.38345667719841003,
"log_odds_ratio": -0.6357052326202393,
"logits/chosen": 1.7336671352386475,
"logits/rejected": 1.7102609872817993,
"logps/chosen": -0.8580729365348816,
"logps/rejected": -1.0796293020248413,
"loss": 8.8459,
"nll_loss": 1.041303038597107,
"rewards/accuracies": 0.643750011920929,
"rewards/chosen": -0.008580728434026241,
"rewards/margins": 0.0022155637852847576,
"rewards/rejected": -0.010796292684972286,
"step": 190
},
{
"epoch": 0.418848167539267,
"grad_norm": 8.013388633728027,
"learning_rate": 3.621088951385353e-07,
"log_odds_chosen": 0.31622734665870667,
"log_odds_ratio": -0.6587765216827393,
"logits/chosen": 1.788631796836853,
"logits/rejected": 1.7873141765594482,
"logps/chosen": -0.8683417439460754,
"logps/rejected": -1.0568631887435913,
"loss": 8.6911,
"nll_loss": 1.0708550214767456,
"rewards/accuracies": 0.6000000238418579,
"rewards/chosen": -0.008683416061103344,
"rewards/margins": 0.0018852159846574068,
"rewards/rejected": -0.010568631812930107,
"step": 200
},
{
"epoch": 0.418848167539267,
"eval_log_odds_chosen": 0.3057795763015747,
"eval_log_odds_ratio": -0.6610966324806213,
"eval_logits/chosen": 2.0427238941192627,
"eval_logits/rejected": 2.0572261810302734,
"eval_logps/chosen": -0.8642156720161438,
"eval_logps/rejected": -1.0644797086715698,
"eval_loss": 1.0935468673706055,
"eval_nll_loss": 1.1233118772506714,
"eval_rewards/accuracies": 0.6100000143051147,
"eval_rewards/chosen": -0.008642155677080154,
"eval_rewards/margins": 0.0020026403944939375,
"eval_rewards/rejected": -0.010644798167049885,
"eval_runtime": 46.7517,
"eval_samples_per_second": 42.779,
"eval_steps_per_second": 5.347,
"step": 200
},
{
"epoch": 0.4397905759162304,
"grad_norm": 7.279272556304932,
"learning_rate": 3.454593922550693e-07,
"log_odds_chosen": 0.301203191280365,
"log_odds_ratio": -0.6769061088562012,
"logits/chosen": 1.896989107131958,
"logits/rejected": 1.8812087774276733,
"logps/chosen": -0.8712860345840454,
"logps/rejected": -1.0698888301849365,
"loss": 9.0687,
"nll_loss": 1.1046525239944458,
"rewards/accuracies": 0.6187499761581421,
"rewards/chosen": -0.008712859824299812,
"rewards/margins": 0.0019860276952385902,
"rewards/rejected": -0.010698886588215828,
"step": 210
},
{
"epoch": 0.4607329842931937,
"grad_norm": 8.950860023498535,
"learning_rate": 3.2829819606729477e-07,
"log_odds_chosen": 0.2927771508693695,
"log_odds_ratio": -0.6683081984519958,
"logits/chosen": 1.983677625656128,
"logits/rejected": 2.009464740753174,
"logps/chosen": -0.9059684872627258,
"logps/rejected": -1.0988754034042358,
"loss": 8.9995,
"nll_loss": 1.1874160766601562,
"rewards/accuracies": 0.6000000238418579,
"rewards/chosen": -0.009059684351086617,
"rewards/margins": 0.0019290696363896132,
"rewards/rejected": -0.010988753288984299,
"step": 220
},
{
"epoch": 0.4816753926701571,
"grad_norm": 12.437826156616211,
"learning_rate": 3.1071729615293424e-07,
"log_odds_chosen": 0.3832097351551056,
"log_odds_ratio": -0.6394175291061401,
"logits/chosen": 1.6963777542114258,
"logits/rejected": 1.7382042407989502,
"logps/chosen": -0.878866970539093,
"logps/rejected": -1.1088063716888428,
"loss": 8.6532,
"nll_loss": 1.0316081047058105,
"rewards/accuracies": 0.6312500238418579,
"rewards/chosen": -0.008788668550550938,
"rewards/margins": 0.002299393992871046,
"rewards/rejected": -0.011088063009083271,
"step": 230
},
{
"epoch": 0.5026178010471204,
"grad_norm": 8.457469940185547,
"learning_rate": 2.9281093183781403e-07,
"log_odds_chosen": 0.31373724341392517,
"log_odds_ratio": -0.6869611144065857,
"logits/chosen": 1.7616941928863525,
"logits/rejected": 1.7711395025253296,
"logps/chosen": -0.8636420965194702,
"logps/rejected": -1.0460337400436401,
"loss": 8.8157,
"nll_loss": 1.0462042093276978,
"rewards/accuracies": 0.6343749761581421,
"rewards/chosen": -0.008636420592665672,
"rewards/margins": 0.0018239166820421815,
"rewards/rejected": -0.010460336692631245,
"step": 240
},
{
"epoch": 0.5235602094240838,
"grad_norm": 7.4062604904174805,
"learning_rate": 2.7467508704251135e-07,
"log_odds_chosen": 0.3899185359477997,
"log_odds_ratio": -0.6580259203910828,
"logits/chosen": 1.8792108297348022,
"logits/rejected": 1.7834640741348267,
"logps/chosen": -0.8666375279426575,
"logps/rejected": -1.1335757970809937,
"loss": 8.8203,
"nll_loss": 1.1227291822433472,
"rewards/accuracies": 0.574999988079071,
"rewards/chosen": -0.008666375651955605,
"rewards/margins": 0.002669382141903043,
"rewards/rejected": -0.011335758492350578,
"step": 250
},
{
"epoch": 0.5445026178010471,
"grad_norm": 8.197929382324219,
"learning_rate": 2.5640697577740815e-07,
"log_odds_chosen": 0.3152967393398285,
"log_odds_ratio": -0.6758849620819092,
"logits/chosen": 1.7824039459228516,
"logits/rejected": 1.766455054283142,
"logps/chosen": -0.8502113223075867,
"logps/rejected": -1.0350358486175537,
"loss": 8.805,
"nll_loss": 1.0837422609329224,
"rewards/accuracies": 0.59375,
"rewards/chosen": -0.008502112701535225,
"rewards/margins": 0.001848246669396758,
"rewards/rejected": -0.010350359603762627,
"step": 260
},
{
"epoch": 0.5654450261780105,
"grad_norm": 7.904941558837891,
"learning_rate": 2.381045210440644e-07,
"log_odds_chosen": 0.28941792249679565,
"log_odds_ratio": -0.6771480441093445,
"logits/chosen": 1.8082011938095093,
"logits/rejected": 1.841059923171997,
"logps/chosen": -0.8608342409133911,
"logps/rejected": -1.0518665313720703,
"loss": 8.5612,
"nll_loss": 1.0589611530303955,
"rewards/accuracies": 0.596875011920929,
"rewards/chosen": -0.008608341217041016,
"rewards/margins": 0.0019103230442851782,
"rewards/rejected": -0.010518666356801987,
"step": 270
},
{
"epoch": 0.5863874345549738,
"grad_norm": 9.047080039978027,
"learning_rate": 2.1986582993616925e-07,
"log_odds_chosen": 0.35996752977371216,
"log_odds_ratio": -0.651909589767456,
"logits/chosen": 1.741289496421814,
"logits/rejected": 1.7088711261749268,
"logps/chosen": -0.8389939069747925,
"logps/rejected": -1.060254693031311,
"loss": 8.6897,
"nll_loss": 1.0688748359680176,
"rewards/accuracies": 0.5874999761581421,
"rewards/chosen": -0.008389937691390514,
"rewards/margins": 0.002212608465924859,
"rewards/rejected": -0.010602546855807304,
"step": 280
},
{
"epoch": 0.6073298429319371,
"grad_norm": 7.390078067779541,
"learning_rate": 2.0178866775369774e-07,
"log_odds_chosen": 0.31485018134117126,
"log_odds_ratio": -0.6769185066223145,
"logits/chosen": 1.8932464122772217,
"logits/rejected": 1.8560025691986084,
"logps/chosen": -0.8659391403198242,
"logps/rejected": -1.0502351522445679,
"loss": 8.7301,
"nll_loss": 1.1085679531097412,
"rewards/accuracies": 0.5718749761581421,
"rewards/chosen": -0.008659390732645988,
"rewards/margins": 0.0018429612973704934,
"rewards/rejected": -0.010502351447939873,
"step": 290
},
{
"epoch": 0.6282722513089005,
"grad_norm": 7.00534200668335,
"learning_rate": 1.839699339491937e-07,
"log_odds_chosen": 0.2003917694091797,
"log_odds_ratio": -0.7062225937843323,
"logits/chosen": 1.7795374393463135,
"logits/rejected": 1.7968852519989014,
"logps/chosen": -0.9032294154167175,
"logps/rejected": -1.0224246978759766,
"loss": 8.5708,
"nll_loss": 1.07076096534729,
"rewards/accuracies": 0.559374988079071,
"rewards/chosen": -0.00903229508548975,
"rewards/margins": 0.0011919522657990456,
"rewards/rejected": -0.010224247351288795,
"step": 300
},
{
"epoch": 0.6492146596858639,
"grad_norm": 8.29725170135498,
"learning_rate": 1.6650514271527465e-07,
"log_odds_chosen": 0.33252888917922974,
"log_odds_ratio": -0.6427666544914246,
"logits/chosen": 1.7817466259002686,
"logits/rejected": 1.805783987045288,
"logps/chosen": -0.8416460752487183,
"logps/rejected": -1.0354901552200317,
"loss": 8.5523,
"nll_loss": 1.023045301437378,
"rewards/accuracies": 0.606249988079071,
"rewards/chosen": -0.008416460826992989,
"rewards/margins": 0.0019384392071515322,
"rewards/rejected": -0.010354900732636452,
"step": 310
},
{
"epoch": 0.6701570680628273,
"grad_norm": 7.233508110046387,
"learning_rate": 1.4948791099758052e-07,
"log_odds_chosen": 0.302054762840271,
"log_odds_ratio": -0.6652564406394958,
"logits/chosen": 1.8713328838348389,
"logits/rejected": 1.906873106956482,
"logps/chosen": -0.857520580291748,
"logps/rejected": -1.0245308876037598,
"loss": 8.7328,
"nll_loss": 1.081656813621521,
"rewards/accuracies": 0.609375,
"rewards/chosen": -0.008575205691158772,
"rewards/margins": 0.0016701031709089875,
"rewards/rejected": -0.010245309211313725,
"step": 320
},
{
"epoch": 0.6910994764397905,
"grad_norm": 7.668047904968262,
"learning_rate": 1.3300945667758012e-07,
"log_odds_chosen": 0.3296203017234802,
"log_odds_ratio": -0.6710628867149353,
"logits/chosen": 1.8118549585342407,
"logits/rejected": 1.7958993911743164,
"logps/chosen": -0.8951608538627625,
"logps/rejected": -1.0847995281219482,
"loss": 8.9004,
"nll_loss": 1.1010843515396118,
"rewards/accuracies": 0.590624988079071,
"rewards/chosen": -0.008951608091592789,
"rewards/margins": 0.0018963876646012068,
"rewards/rejected": -0.010847995989024639,
"step": 330
},
{
"epoch": 0.7120418848167539,
"grad_norm": 8.6635160446167,
"learning_rate": 1.1715810961514072e-07,
"log_odds_chosen": 0.3200518488883972,
"log_odds_ratio": -0.6835609078407288,
"logits/chosen": 1.8284895420074463,
"logits/rejected": 1.8095057010650635,
"logps/chosen": -0.8995221853256226,
"logps/rejected": -1.0893176794052124,
"loss": 8.7465,
"nll_loss": 1.092341661453247,
"rewards/accuracies": 0.590624988079071,
"rewards/chosen": -0.008995221927762032,
"rewards/margins": 0.0018979553133249283,
"rewards/rejected": -0.01089317724108696,
"step": 340
},
{
"epoch": 0.7329842931937173,
"grad_norm": 8.274894714355469,
"learning_rate": 1.0201883817182949e-07,
"log_odds_chosen": 0.3764176368713379,
"log_odds_ratio": -0.6291422843933105,
"logits/chosen": 1.890041708946228,
"logits/rejected": 1.9048725366592407,
"logps/chosen": -0.8920204043388367,
"logps/rejected": -1.1221367120742798,
"loss": 8.845,
"nll_loss": 1.1252596378326416,
"rewards/accuracies": 0.640625,
"rewards/chosen": -0.00892020296305418,
"rewards/margins": 0.00230116187594831,
"rewards/rejected": -0.011221365071833134,
"step": 350
},
{
"epoch": 0.7539267015706806,
"grad_norm": 8.172623634338379,
"learning_rate": 8.76727937529367e-08,
"log_odds_chosen": 0.3156259059906006,
"log_odds_ratio": -0.6558908224105835,
"logits/chosen": 1.8350282907485962,
"logits/rejected": 1.8624794483184814,
"logps/chosen": -0.8711791038513184,
"logps/rejected": -1.0731276273727417,
"loss": 8.8469,
"nll_loss": 1.1009576320648193,
"rewards/accuracies": 0.612500011920929,
"rewards/chosen": -0.008711791597306728,
"rewards/margins": 0.002019485691562295,
"rewards/rejected": -0.010731276124715805,
"step": 360
},
{
"epoch": 0.774869109947644,
"grad_norm": 9.326078414916992,
"learning_rate": 7.419687580962222e-08,
"log_odds_chosen": 0.3530608117580414,
"log_odds_ratio": -0.6621404886245728,
"logits/chosen": 1.9310886859893799,
"logits/rejected": 1.8785558938980103,
"logps/chosen": -0.9094289541244507,
"logps/rejected": -1.133821725845337,
"loss": 8.7173,
"nll_loss": 1.1301552057266235,
"rewards/accuracies": 0.5687500238418579,
"rewards/chosen": -0.00909428857266903,
"rewards/margins": 0.0022439290769398212,
"rewards/rejected": -0.011338217183947563,
"step": 370
},
{
"epoch": 0.7958115183246073,
"grad_norm": 9.022133827209473,
"learning_rate": 6.166331963291519e-08,
"log_odds_chosen": 0.23490826785564423,
"log_odds_ratio": -0.6955921053886414,
"logits/chosen": 1.9562733173370361,
"logits/rejected": 1.8897705078125,
"logps/chosen": -0.8423633575439453,
"logps/rejected": -0.994970977306366,
"loss": 8.9164,
"nll_loss": 1.1008055210113525,
"rewards/accuracies": 0.5874999761581421,
"rewards/chosen": -0.008423633873462677,
"rewards/margins": 0.0015260763466358185,
"rewards/rejected": -0.009949709288775921,
"step": 380
},
{
"epoch": 0.8167539267015707,
"grad_norm": 7.358635425567627,
"learning_rate": 5.013930914912476e-08,
"log_odds_chosen": 0.27963709831237793,
"log_odds_ratio": -0.6734436750411987,
"logits/chosen": 1.956067681312561,
"logits/rejected": 1.9839175939559937,
"logps/chosen": -0.8422489166259766,
"logps/rejected": -1.0293291807174683,
"loss": 8.6461,
"nll_loss": 1.0561668872833252,
"rewards/accuracies": 0.578125,
"rewards/chosen": -0.008422489278018475,
"rewards/margins": 0.0018708031857386231,
"rewards/rejected": -0.010293291881680489,
"step": 390
},
{
"epoch": 0.837696335078534,
"grad_norm": 7.309168338775635,
"learning_rate": 3.968661679220467e-08,
"log_odds_chosen": 0.18631207942962646,
"log_odds_ratio": -0.7142513990402222,
"logits/chosen": 1.8434158563613892,
"logits/rejected": 1.8182004690170288,
"logps/chosen": -0.8909440040588379,
"logps/rejected": -1.0001533031463623,
"loss": 8.6763,
"nll_loss": 1.1097790002822876,
"rewards/accuracies": 0.5406249761581421,
"rewards/chosen": -0.008909439668059349,
"rewards/margins": 0.00109209178481251,
"rewards/rejected": -0.01000153087079525,
"step": 400
},
{
"epoch": 0.837696335078534,
"eval_log_odds_chosen": 0.3030896484851837,
"eval_log_odds_ratio": -0.6629699468612671,
"eval_logits/chosen": 2.1568901538848877,
"eval_logits/rejected": 2.191537857055664,
"eval_logps/chosen": -0.8522689342498779,
"eval_logps/rejected": -1.0496091842651367,
"eval_loss": 1.078381061553955,
"eval_nll_loss": 1.1099687814712524,
"eval_rewards/accuracies": 0.6060000061988831,
"eval_rewards/chosen": -0.008522690273821354,
"eval_rewards/margins": 0.0019734008237719536,
"eval_rewards/rejected": -0.010496090166270733,
"eval_runtime": 46.7843,
"eval_samples_per_second": 42.749,
"eval_steps_per_second": 5.344,
"step": 400
},
{
"epoch": 0.8586387434554974,
"grad_norm": 8.156121253967285,
"learning_rate": 3.036127238347164e-08,
"log_odds_chosen": 0.31570926308631897,
"log_odds_ratio": -0.6613708734512329,
"logits/chosen": 1.936348557472229,
"logits/rejected": 1.903748869895935,
"logps/chosen": -0.8457509875297546,
"logps/rejected": -1.0446395874023438,
"loss": 8.7334,
"nll_loss": 1.0835765600204468,
"rewards/accuracies": 0.609375,
"rewards/chosen": -0.00845750980079174,
"rewards/margins": 0.001988885225728154,
"rewards/rejected": -0.010446394793689251,
"step": 410
},
{
"epoch": 0.8795811518324608,
"grad_norm": 8.324801445007324,
"learning_rate": 2.2213262793589482e-08,
"log_odds_chosen": 0.29435402154922485,
"log_odds_ratio": -0.6616442799568176,
"logits/chosen": 1.8585201501846313,
"logits/rejected": 1.8436048030853271,
"logps/chosen": -0.8651920557022095,
"logps/rejected": -1.0383055210113525,
"loss": 8.6628,
"nll_loss": 1.0541940927505493,
"rewards/accuracies": 0.596875011920929,
"rewards/chosen": -0.008651919662952423,
"rewards/margins": 0.0017311364645138383,
"rewards/rejected": -0.010383055545389652,
"step": 420
},
{
"epoch": 0.900523560209424,
"grad_norm": 8.076932907104492,
"learning_rate": 1.5286263996730026e-08,
"log_odds_chosen": 0.48232072591781616,
"log_odds_ratio": -0.6256499290466309,
"logits/chosen": 1.8976824283599854,
"logits/rejected": 1.8619095087051392,
"logps/chosen": -0.8179370760917664,
"logps/rejected": -1.1194853782653809,
"loss": 8.6524,
"nll_loss": 1.1020301580429077,
"rewards/accuracies": 0.609375,
"rewards/chosen": -0.008179371245205402,
"rewards/margins": 0.003015482099726796,
"rewards/rejected": -0.011194853112101555,
"step": 430
},
{
"epoch": 0.9214659685863874,
"grad_norm": 8.184592247009277,
"learning_rate": 9.617406953185136e-09,
"log_odds_chosen": 0.3493059575557709,
"log_odds_ratio": -0.646949052810669,
"logits/chosen": 1.9365053176879883,
"logits/rejected": 1.9345598220825195,
"logps/chosen": -0.8272320032119751,
"logps/rejected": -1.0139485597610474,
"loss": 8.7643,
"nll_loss": 1.0985225439071655,
"rewards/accuracies": 0.628125011920929,
"rewards/chosen": -0.008272320032119751,
"rewards/margins": 0.0018671646248549223,
"rewards/rejected": -0.010139484889805317,
"step": 440
},
{
"epoch": 0.9424083769633508,
"grad_norm": 8.757542610168457,
"learning_rate": 5.2370785753763356e-09,
"log_odds_chosen": 0.2785571217536926,
"log_odds_ratio": -0.6923194527626038,
"logits/chosen": 1.9607532024383545,
"logits/rejected": 1.9347816705703735,
"logps/chosen": -0.8750311732292175,
"logps/rejected": -1.0376728773117065,
"loss": 8.7208,
"nll_loss": 1.1206778287887573,
"rewards/accuracies": 0.6156250238418579,
"rewards/chosen": -0.0087503120303154,
"rewards/margins": 0.0016264161095023155,
"rewards/rejected": -0.010376728139817715,
"step": 450
},
{
"epoch": 0.9633507853403142,
"grad_norm": 7.935389995574951,
"learning_rate": 2.168758844148272e-09,
"log_odds_chosen": 0.33909493684768677,
"log_odds_ratio": -0.654638409614563,
"logits/chosen": 2.066657304763794,
"logits/rejected": 2.039240598678589,
"logps/chosen": -0.8697013854980469,
"logps/rejected": -1.0564416646957397,
"loss": 8.7399,
"nll_loss": 1.149594783782959,
"rewards/accuracies": 0.59375,
"rewards/chosen": -0.00869701337069273,
"rewards/margins": 0.0018674019956961274,
"rewards/rejected": -0.010564416646957397,
"step": 460
},
{
"epoch": 0.9842931937172775,
"grad_norm": 7.827225685119629,
"learning_rate": 4.288949484559934e-10,
"log_odds_chosen": 0.38105446100234985,
"log_odds_ratio": -0.6435109376907349,
"logits/chosen": 1.9544579982757568,
"logits/rejected": 1.9189164638519287,
"logps/chosen": -0.8262852430343628,
"logps/rejected": -1.0520846843719482,
"loss": 8.7101,
"nll_loss": 1.105509638786316,
"rewards/accuracies": 0.6031249761581421,
"rewards/chosen": -0.00826285220682621,
"rewards/margins": 0.002257994608953595,
"rewards/rejected": -0.010520846582949162,
"step": 470
},
{
"epoch": 0.9989528795811519,
"step": 477,
"total_flos": 0.0,
"train_loss": 8.957356926780077,
"train_runtime": 5488.1377,
"train_samples_per_second": 11.139,
"train_steps_per_second": 0.087
}
],
"logging_steps": 10,
"max_steps": 477,
"num_input_tokens_seen": 0,
"num_train_epochs": 1,
"save_steps": 500,
"stateful_callbacks": {
"TrainerControl": {
"args": {
"should_epoch_stop": false,
"should_evaluate": false,
"should_log": false,
"should_save": false,
"should_training_stop": false
},
"attributes": {}
}
},
"total_flos": 0.0,
"train_batch_size": 4,
"trial_name": null,
"trial_params": null
}

1
vocab.json Normal file

File diff suppressed because one or more lines are too long