初始化项目,由ModelHub XC社区提供模型

Model: voidful/qd-phi-1_5
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-10 11:46:12 +08:00
commit 9913c291f6
19 changed files with 82241 additions and 0 deletions

50
.gitattributes vendored Normal file
View File

@@ -0,0 +1,50 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text
tokenizer_config.json filter=lfs diff=lfs merge=lfs -text

88
README.md Normal file
View File

@@ -0,0 +1,88 @@
---
language:
- en
license: mit
datasets:
- corbyrosset/researchy_questions
---
## Model Summary
The language model Phi-1.5 is a Transformer with **1.3 billion** parameters. It was trained using the same data sources as [phi-1](https://huggingface.co/microsoft/phi-1), augmented with a new data source that consists of various NLP synthetic texts. When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-1.5 demonstrates a nearly state-of-the-art performance among models with less than 10 billion parameters.
We've trained Microsoft Research's phi-1.5, 1.3B parameter model with question decomposition datasets.
## How to Use
Phi-1.5 has been integrated in the `transformers` version 4.37.0. If you are using a lower version, ensure that you are doing the following:
* When loading the model, ensure that `trust_remote_code=True` is passed as an argument of the `from_pretrained()` function.
The current `transformers` version can be verified with: `pip list | grep transformers`.
## Example
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
torch.set_default_device("cuda")
model = AutoModelForCausalLM.from_pretrained("voidful/qd-phi-1_5",trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("voidful/qd-phi-1_5", trust_remote_code=True,device_map="auto")
inputs = tokenizer("Is it better to love or to be loved?", return_tensors="pt", return_attention_mask=True)
outputs = model.generate(**inputs, max_length=1024)
text = tokenizer.batch_decode(outputs)[0]
print(text)
```
## Result
*Question* Is it better to love or to be loved?
*Decomposition*
```json
{
"What does it mean to love?": [
"How do different philosophical, psychological, and cultural perspectives define love?",
"What are the characteristics and types of love?",
"How does love affect human behavior and well-being?"
],
"What does it mean to be loved?": [
"How do different philosophical, psychological, and cultural perspectives define being loved?",
"What are the characteristics and types of being loved?",
"How does being loved affect human behavior and well-being?"
],
"What are the benefits and drawbacks of loving oneself?": [
"How does loving oneself relate to self-esteem, self-acceptance, and self-care?",
"How does loving oneself affect one's relationships with others?",
"What are the challenges and risks of loving oneself?"
],
"What are the benefits and drawbacks of being loved by others?": [
"How does being loved by others relate to social connection, belonging, and support?",
"How does being loved by others affect one's identity, autonomy, and agency?",
"What are the challenges and risks of being loved by others?"
],
"How do the concepts of loving and being loved interact and influence each other?": [
"How do different situations and contexts affect the dynamics of love and being loved?",
"How do different individuals and groups experience and express love and being loved differently?",
"How do love and being loved shape and change over time?"
],
"How can one balance and integrate love and being loved in one's life?": [
"What are some strategies and practices to cultivate and sustain love and being loved?",
"What are some examples and models of healthy and unhealthy relationships with love and being loved?",
"What are some goals and values that guide one's choices and actions regarding love and being loved?"
]
}
```
*Queries*
```
definition and types of love
definition and types of being loved
benefits and drawbacks of loving oneself
benefits and drawbacks of being loved by others
interaction and influence of love and being loved
strategies and practices to balance and integrate love and being loved
examples and models of healthy and unhealthy relationships with love and being loved
goals and values regarding love and being loved
```

30725
added_tokens.json Normal file

File diff suppressed because it is too large Load Diff

37
config.json Normal file
View File

@@ -0,0 +1,37 @@
{
"_name_or_path": "voidful/phi-1_5_base",
"architectures": [
"PhiForCausalLM"
],
"attention_dropout": 0.0,
"auto_map": {
"AutoConfig": "microsoft/phi-1_5--configuration_phi.PhiConfig",
"AutoModelForCausalLM": "microsoft/phi-1_5--modeling_phi.PhiForCausalLM"
},
"bos_token_id": null,
"embd_pdrop": 0.0,
"eos_token_id": null,
"hidden_act": "gelu_new",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 8192,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 2048,
"model_type": "phi",
"num_attention_heads": 32,
"num_hidden_layers": 24,
"num_key_value_heads": 32,
"partial_rotary_factor": 0.5,
"qk_layernorm": false,
"resid_pdrop": 0.0,
"rope_scaling": {
"factor": 16.0,
"type": "dynamic"
},
"rope_theta": 50000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.40.0.dev0",
"use_cache": true,
"vocab_size": 80980
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

9
generation_config.json Normal file
View File

@@ -0,0 +1,9 @@
{
"_from_model_config": true,
"eos_token_id": [
70976,
50256,
70977
],
"transformers_version": "4.40.0.dev0"
}

50001
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:86816b5696065c21d950ce7a7a8f87d04c6d33d0e1dace4ffc6d01cdc80970b7
size 4960313864

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:abd5131a42c24f39f938366b9c41a2585ff7059586fe19c2af46de482388ab32
size 1200841024

View File

@@ -0,0 +1,348 @@
{
"metadata": {
"total_size": 6161117520
},
"weight_map": {
"lm_head.bias": "model-00002-of-00002.safetensors",
"lm_head.weight": "model-00002-of-00002.safetensors",
"model.embed_tokens.weight": "model-00001-of-00002.safetensors",
"model.final_layernorm.bias": "model-00002-of-00002.safetensors",
"model.final_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.0.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.0.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.0.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.0.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.0.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.1.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.1.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.1.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.1.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.10.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.10.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.10.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.10.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.11.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.11.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.11.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.11.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.12.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.12.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.12.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.12.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.13.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.13.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.13.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.13.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.14.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.14.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.14.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.14.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.15.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.15.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.15.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.15.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.16.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.16.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.16.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.16.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.17.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.17.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.17.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.17.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.18.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.18.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.18.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.18.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.19.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.19.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.19.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.19.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.2.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.2.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.2.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.2.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.20.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.20.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.20.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.20.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.21.input_layernorm.bias": "model-00002-of-00002.safetensors",
"model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.21.mlp.fc1.bias": "model-00002-of-00002.safetensors",
"model.layers.21.mlp.fc1.weight": "model-00002-of-00002.safetensors",
"model.layers.21.mlp.fc2.bias": "model-00002-of-00002.safetensors",
"model.layers.21.mlp.fc2.weight": "model-00002-of-00002.safetensors",
"model.layers.21.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.22.input_layernorm.bias": "model-00002-of-00002.safetensors",
"model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.22.mlp.fc1.bias": "model-00002-of-00002.safetensors",
"model.layers.22.mlp.fc1.weight": "model-00002-of-00002.safetensors",
"model.layers.22.mlp.fc2.bias": "model-00002-of-00002.safetensors",
"model.layers.22.mlp.fc2.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.dense.bias": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.dense.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.input_layernorm.bias": "model-00002-of-00002.safetensors",
"model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.23.mlp.fc1.bias": "model-00002-of-00002.safetensors",
"model.layers.23.mlp.fc1.weight": "model-00002-of-00002.safetensors",
"model.layers.23.mlp.fc2.bias": "model-00002-of-00002.safetensors",
"model.layers.23.mlp.fc2.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.dense.bias": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.dense.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.3.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.3.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.3.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.3.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.3.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.4.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.4.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.4.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.4.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.5.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.5.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.5.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.5.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.6.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.6.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.6.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.6.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.7.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.7.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.7.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.7.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.8.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.8.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.8.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.8.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.input_layernorm.bias": "model-00001-of-00002.safetensors",
"model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.9.mlp.fc1.bias": "model-00001-of-00002.safetensors",
"model.layers.9.mlp.fc1.weight": "model-00001-of-00002.safetensors",
"model.layers.9.mlp.fc2.bias": "model-00001-of-00002.safetensors",
"model.layers.9.mlp.fc2.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.dense.bias": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.dense.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors"
}
}

3
optimizer.pt Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cd7e856d791a6e40145ae9c86ef583685b91b3cd3cc188b7dab9020d51d7b37f
size 6941242

3
rng_state.pth Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8d138cfe3a4adf21f048848ee35837c9a757a0a3616ff7adbb45b69aac247435
size 14244

3
scheduler.pt Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:88bab19069aaa4ec1f577667c17275a0431d8f65965edb6dab9a36aa495194ae
size 1064

24
special_tokens_map.json Normal file
View File

@@ -0,0 +1,24 @@
{
"bos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"pad_token": "<|endoftext|>",
"unk_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f37f03903ec162050c98a079961269acaa65eac3a403209aa22ae95fc8e32785
size 7788322

3
tokenizer_config.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:62e45bb94aac819baa17da01850c8e54216980578a42ea5f949236affb2b6687
size 5373819

933
trainer_state.json Normal file
View File

@@ -0,0 +1,933 @@
{
"best_metric": null,
"best_model_checkpoint": null,
"epoch": 1.9999777777777776,
"eval_steps": 500,
"global_step": 12857,
"is_hyper_param_search": false,
"is_local_process_zero": true,
"is_world_process_zero": true,
"log_history": [
{
"epoch": 0.02,
"grad_norm": 2.191666603088379,
"learning_rate": 2.0000000000000002e-07,
"loss": 1.3084,
"step": 100
},
{
"epoch": 0.03,
"grad_norm": 1.7032761573791504,
"learning_rate": 4.0000000000000003e-07,
"loss": 1.0978,
"step": 200
},
{
"epoch": 0.05,
"grad_norm": 1.4913277626037598,
"learning_rate": 6.000000000000001e-07,
"loss": 0.8859,
"step": 300
},
{
"epoch": 0.06,
"grad_norm": 1.470658779144287,
"learning_rate": 8.000000000000001e-07,
"loss": 0.8098,
"step": 400
},
{
"epoch": 0.08,
"grad_norm": 1.5057852268218994,
"learning_rate": 1.0000000000000002e-06,
"loss": 0.7472,
"step": 500
},
{
"epoch": 0.09,
"grad_norm": 1.3578836917877197,
"learning_rate": 1.2000000000000002e-06,
"loss": 0.704,
"step": 600
},
{
"epoch": 0.11,
"grad_norm": 1.3679372072219849,
"learning_rate": 1.4000000000000001e-06,
"loss": 0.6937,
"step": 700
},
{
"epoch": 0.12,
"grad_norm": 1.3869675397872925,
"learning_rate": 1.6000000000000001e-06,
"loss": 0.6667,
"step": 800
},
{
"epoch": 0.14,
"grad_norm": 1.209143042564392,
"learning_rate": 1.8000000000000001e-06,
"loss": 0.6586,
"step": 900
},
{
"epoch": 0.16,
"grad_norm": 1.2448453903198242,
"learning_rate": 2.0000000000000003e-06,
"loss": 0.6464,
"step": 1000
},
{
"epoch": 0.17,
"grad_norm": 1.3068097829818726,
"learning_rate": 2.2e-06,
"loss": 0.6435,
"step": 1100
},
{
"epoch": 0.19,
"grad_norm": 1.3215032815933228,
"learning_rate": 2.4000000000000003e-06,
"loss": 0.6335,
"step": 1200
},
{
"epoch": 0.2,
"grad_norm": 1.1790049076080322,
"learning_rate": 2.6e-06,
"loss": 0.6297,
"step": 1300
},
{
"epoch": 0.22,
"grad_norm": 1.2376770973205566,
"learning_rate": 2.8000000000000003e-06,
"loss": 0.6139,
"step": 1400
},
{
"epoch": 0.23,
"grad_norm": 1.114210844039917,
"learning_rate": 3e-06,
"loss": 0.6099,
"step": 1500
},
{
"epoch": 0.25,
"grad_norm": 1.1380881071090698,
"learning_rate": 3.2000000000000003e-06,
"loss": 0.6129,
"step": 1600
},
{
"epoch": 0.26,
"grad_norm": 1.2346134185791016,
"learning_rate": 3.4000000000000005e-06,
"loss": 0.5995,
"step": 1700
},
{
"epoch": 0.28,
"grad_norm": 1.293491005897522,
"learning_rate": 3.6000000000000003e-06,
"loss": 0.5975,
"step": 1800
},
{
"epoch": 0.3,
"grad_norm": 1.2162506580352783,
"learning_rate": 3.8000000000000005e-06,
"loss": 0.5941,
"step": 1900
},
{
"epoch": 0.31,
"grad_norm": 1.255366563796997,
"learning_rate": 4.000000000000001e-06,
"loss": 0.5814,
"step": 2000
},
{
"epoch": 0.33,
"grad_norm": 1.230293869972229,
"learning_rate": 4.2000000000000004e-06,
"loss": 0.5912,
"step": 2100
},
{
"epoch": 0.34,
"grad_norm": 1.3100721836090088,
"learning_rate": 4.4e-06,
"loss": 0.583,
"step": 2200
},
{
"epoch": 0.36,
"grad_norm": 1.2346209287643433,
"learning_rate": 4.600000000000001e-06,
"loss": 0.5716,
"step": 2300
},
{
"epoch": 0.37,
"grad_norm": 1.4086166620254517,
"learning_rate": 4.800000000000001e-06,
"loss": 0.5733,
"step": 2400
},
{
"epoch": 0.39,
"grad_norm": 1.1538732051849365,
"learning_rate": 5e-06,
"loss": 0.5708,
"step": 2500
},
{
"epoch": 0.4,
"grad_norm": 1.1694637537002563,
"learning_rate": 5.2e-06,
"loss": 0.5775,
"step": 2600
},
{
"epoch": 0.42,
"grad_norm": 1.1683225631713867,
"learning_rate": 5.400000000000001e-06,
"loss": 0.5635,
"step": 2700
},
{
"epoch": 0.44,
"grad_norm": 1.141072392463684,
"learning_rate": 5.600000000000001e-06,
"loss": 0.5669,
"step": 2800
},
{
"epoch": 0.45,
"grad_norm": 1.0785812139511108,
"learning_rate": 5.8e-06,
"loss": 0.562,
"step": 2900
},
{
"epoch": 0.47,
"grad_norm": 1.0285478830337524,
"learning_rate": 6e-06,
"loss": 0.5655,
"step": 3000
},
{
"epoch": 0.48,
"grad_norm": 1.145011067390442,
"learning_rate": 6.200000000000001e-06,
"loss": 0.5652,
"step": 3100
},
{
"epoch": 0.5,
"grad_norm": 1.1503745317459106,
"learning_rate": 6.4000000000000006e-06,
"loss": 0.5498,
"step": 3200
},
{
"epoch": 0.51,
"grad_norm": 1.0536553859710693,
"learning_rate": 6.600000000000001e-06,
"loss": 0.5509,
"step": 3300
},
{
"epoch": 0.53,
"grad_norm": 1.133914828300476,
"learning_rate": 6.800000000000001e-06,
"loss": 0.5576,
"step": 3400
},
{
"epoch": 0.54,
"grad_norm": 1.0403516292572021,
"learning_rate": 7e-06,
"loss": 0.5468,
"step": 3500
},
{
"epoch": 0.56,
"grad_norm": 1.124706745147705,
"learning_rate": 7.2000000000000005e-06,
"loss": 0.5481,
"step": 3600
},
{
"epoch": 0.58,
"grad_norm": 1.1518924236297607,
"learning_rate": 7.4e-06,
"loss": 0.5445,
"step": 3700
},
{
"epoch": 0.59,
"grad_norm": 1.067154884338379,
"learning_rate": 7.600000000000001e-06,
"loss": 0.5354,
"step": 3800
},
{
"epoch": 0.61,
"grad_norm": 1.0968620777130127,
"learning_rate": 7.800000000000002e-06,
"loss": 0.5409,
"step": 3900
},
{
"epoch": 0.62,
"grad_norm": 1.069420576095581,
"learning_rate": 8.000000000000001e-06,
"loss": 0.5399,
"step": 4000
},
{
"epoch": 0.64,
"grad_norm": 1.1727967262268066,
"learning_rate": 8.2e-06,
"loss": 0.5398,
"step": 4100
},
{
"epoch": 0.65,
"grad_norm": 1.1503219604492188,
"learning_rate": 8.400000000000001e-06,
"loss": 0.5357,
"step": 4200
},
{
"epoch": 0.67,
"grad_norm": 1.1418933868408203,
"learning_rate": 8.6e-06,
"loss": 0.5308,
"step": 4300
},
{
"epoch": 0.68,
"grad_norm": 1.065690517425537,
"learning_rate": 8.8e-06,
"loss": 0.5405,
"step": 4400
},
{
"epoch": 0.7,
"grad_norm": 1.1668462753295898,
"learning_rate": 9e-06,
"loss": 0.5342,
"step": 4500
},
{
"epoch": 0.72,
"grad_norm": 1.1019855737686157,
"learning_rate": 9.200000000000002e-06,
"loss": 0.5261,
"step": 4600
},
{
"epoch": 0.73,
"grad_norm": 1.171284794807434,
"learning_rate": 9.4e-06,
"loss": 0.5269,
"step": 4700
},
{
"epoch": 0.75,
"grad_norm": 1.2499768733978271,
"learning_rate": 9.600000000000001e-06,
"loss": 0.53,
"step": 4800
},
{
"epoch": 0.76,
"grad_norm": 1.0950831174850464,
"learning_rate": 9.800000000000001e-06,
"loss": 0.5369,
"step": 4900
},
{
"epoch": 0.78,
"grad_norm": 1.1700873374938965,
"learning_rate": 1e-05,
"loss": 0.5296,
"step": 5000
},
{
"epoch": 0.79,
"grad_norm": 1.0696507692337036,
"learning_rate": 9.963154016212235e-06,
"loss": 0.5188,
"step": 5100
},
{
"epoch": 0.81,
"grad_norm": 1.1057288646697998,
"learning_rate": 9.926308032424467e-06,
"loss": 0.5272,
"step": 5200
},
{
"epoch": 0.82,
"grad_norm": 0.9978774785995483,
"learning_rate": 9.8894620486367e-06,
"loss": 0.5288,
"step": 5300
},
{
"epoch": 0.84,
"grad_norm": 1.072900652885437,
"learning_rate": 9.852616064848932e-06,
"loss": 0.5247,
"step": 5400
},
{
"epoch": 0.86,
"grad_norm": 0.9971893429756165,
"learning_rate": 9.815770081061166e-06,
"loss": 0.5137,
"step": 5500
},
{
"epoch": 0.87,
"grad_norm": 0.9494544863700867,
"learning_rate": 9.778924097273399e-06,
"loss": 0.5157,
"step": 5600
},
{
"epoch": 0.89,
"grad_norm": 1.04669189453125,
"learning_rate": 9.742078113485631e-06,
"loss": 0.5267,
"step": 5700
},
{
"epoch": 0.9,
"grad_norm": 1.0739444494247437,
"learning_rate": 9.705232129697863e-06,
"loss": 0.5207,
"step": 5800
},
{
"epoch": 0.92,
"grad_norm": 0.9554969668388367,
"learning_rate": 9.668386145910098e-06,
"loss": 0.5136,
"step": 5900
},
{
"epoch": 0.93,
"grad_norm": 0.9823188185691833,
"learning_rate": 9.63154016212233e-06,
"loss": 0.5137,
"step": 6000
},
{
"epoch": 0.95,
"grad_norm": 0.9793609380722046,
"learning_rate": 9.594694178334562e-06,
"loss": 0.5177,
"step": 6100
},
{
"epoch": 0.96,
"grad_norm": 0.9663506746292114,
"learning_rate": 9.557848194546795e-06,
"loss": 0.5141,
"step": 6200
},
{
"epoch": 0.98,
"grad_norm": 1.0234330892562866,
"learning_rate": 9.521002210759029e-06,
"loss": 0.5077,
"step": 6300
},
{
"epoch": 1.0,
"grad_norm": 1.2162741422653198,
"learning_rate": 9.484156226971261e-06,
"loss": 0.5124,
"step": 6400
},
{
"epoch": 1.0,
"eval_loss": 0.5092498660087585,
"eval_runtime": 206.7717,
"eval_samples_per_second": 31.184,
"eval_steps_per_second": 3.898,
"step": 6428
},
{
"epoch": 1.01,
"grad_norm": 1.1033669710159302,
"learning_rate": 9.447310243183494e-06,
"loss": 0.4707,
"step": 6500
},
{
"epoch": 1.03,
"grad_norm": 1.0372246503829956,
"learning_rate": 9.410464259395726e-06,
"loss": 0.4532,
"step": 6600
},
{
"epoch": 1.04,
"grad_norm": 1.1232597827911377,
"learning_rate": 9.37361827560796e-06,
"loss": 0.4489,
"step": 6700
},
{
"epoch": 1.06,
"grad_norm": 1.061074137687683,
"learning_rate": 9.336772291820193e-06,
"loss": 0.4552,
"step": 6800
},
{
"epoch": 1.07,
"grad_norm": 1.0498615503311157,
"learning_rate": 9.299926308032425e-06,
"loss": 0.4482,
"step": 6900
},
{
"epoch": 1.09,
"grad_norm": 1.0324976444244385,
"learning_rate": 9.263080324244657e-06,
"loss": 0.4511,
"step": 7000
},
{
"epoch": 1.1,
"grad_norm": 0.9778733849525452,
"learning_rate": 9.226234340456892e-06,
"loss": 0.4452,
"step": 7100
},
{
"epoch": 1.12,
"grad_norm": 1.0444563627243042,
"learning_rate": 9.189388356669124e-06,
"loss": 0.4512,
"step": 7200
},
{
"epoch": 1.14,
"grad_norm": 1.0546364784240723,
"learning_rate": 9.152542372881356e-06,
"loss": 0.4486,
"step": 7300
},
{
"epoch": 1.15,
"grad_norm": 1.0815290212631226,
"learning_rate": 9.115696389093589e-06,
"loss": 0.4514,
"step": 7400
},
{
"epoch": 1.17,
"grad_norm": 1.0215433835983276,
"learning_rate": 9.078850405305823e-06,
"loss": 0.4467,
"step": 7500
},
{
"epoch": 1.18,
"grad_norm": 0.9386591911315918,
"learning_rate": 9.042004421518055e-06,
"loss": 0.4515,
"step": 7600
},
{
"epoch": 1.2,
"grad_norm": 1.08309006690979,
"learning_rate": 9.005158437730288e-06,
"loss": 0.4466,
"step": 7700
},
{
"epoch": 1.21,
"grad_norm": 1.0043574571609497,
"learning_rate": 8.968312453942522e-06,
"loss": 0.451,
"step": 7800
},
{
"epoch": 1.23,
"grad_norm": 0.9670729041099548,
"learning_rate": 8.931466470154754e-06,
"loss": 0.4508,
"step": 7900
},
{
"epoch": 1.24,
"grad_norm": 1.001947283744812,
"learning_rate": 8.894620486366987e-06,
"loss": 0.4454,
"step": 8000
},
{
"epoch": 1.26,
"grad_norm": 0.9632651209831238,
"learning_rate": 8.857774502579219e-06,
"loss": 0.4419,
"step": 8100
},
{
"epoch": 1.28,
"grad_norm": 1.0596399307250977,
"learning_rate": 8.820928518791453e-06,
"loss": 0.444,
"step": 8200
},
{
"epoch": 1.29,
"grad_norm": 1.0315296649932861,
"learning_rate": 8.784082535003686e-06,
"loss": 0.4443,
"step": 8300
},
{
"epoch": 1.31,
"grad_norm": 1.2347131967544556,
"learning_rate": 8.747236551215918e-06,
"loss": 0.4478,
"step": 8400
},
{
"epoch": 1.32,
"grad_norm": 1.1402790546417236,
"learning_rate": 8.71039056742815e-06,
"loss": 0.4458,
"step": 8500
},
{
"epoch": 1.34,
"grad_norm": 1.0721864700317383,
"learning_rate": 8.673544583640385e-06,
"loss": 0.4391,
"step": 8600
},
{
"epoch": 1.35,
"grad_norm": 1.072485327720642,
"learning_rate": 8.636698599852617e-06,
"loss": 0.4443,
"step": 8700
},
{
"epoch": 1.37,
"grad_norm": 1.0993961095809937,
"learning_rate": 8.59985261606485e-06,
"loss": 0.4485,
"step": 8800
},
{
"epoch": 1.38,
"grad_norm": 1.1231122016906738,
"learning_rate": 8.563006632277082e-06,
"loss": 0.4423,
"step": 8900
},
{
"epoch": 1.4,
"grad_norm": 1.131405234336853,
"learning_rate": 8.526160648489316e-06,
"loss": 0.4512,
"step": 9000
},
{
"epoch": 1.42,
"grad_norm": 1.0044902563095093,
"learning_rate": 8.489314664701548e-06,
"loss": 0.444,
"step": 9100
},
{
"epoch": 1.43,
"grad_norm": 0.9530848860740662,
"learning_rate": 8.45246868091378e-06,
"loss": 0.4423,
"step": 9200
},
{
"epoch": 1.45,
"grad_norm": 0.8441118597984314,
"learning_rate": 8.415622697126013e-06,
"loss": 0.4443,
"step": 9300
},
{
"epoch": 1.46,
"grad_norm": 0.9523534774780273,
"learning_rate": 8.378776713338247e-06,
"loss": 0.4403,
"step": 9400
},
{
"epoch": 1.48,
"grad_norm": 1.0068507194519043,
"learning_rate": 8.34193072955048e-06,
"loss": 0.4364,
"step": 9500
},
{
"epoch": 1.49,
"grad_norm": 1.0632400512695312,
"learning_rate": 8.305084745762712e-06,
"loss": 0.4414,
"step": 9600
},
{
"epoch": 1.51,
"grad_norm": 1.1296297311782837,
"learning_rate": 8.268238761974944e-06,
"loss": 0.4429,
"step": 9700
},
{
"epoch": 1.52,
"grad_norm": 1.0164484977722168,
"learning_rate": 8.231392778187179e-06,
"loss": 0.4403,
"step": 9800
},
{
"epoch": 1.54,
"grad_norm": 1.1088131666183472,
"learning_rate": 8.194546794399411e-06,
"loss": 0.4407,
"step": 9900
},
{
"epoch": 1.56,
"grad_norm": 1.041729211807251,
"learning_rate": 8.157700810611643e-06,
"loss": 0.4434,
"step": 10000
},
{
"epoch": 1.57,
"grad_norm": 1.0203355550765991,
"learning_rate": 8.120854826823877e-06,
"loss": 0.4428,
"step": 10100
},
{
"epoch": 1.59,
"grad_norm": 1.0350580215454102,
"learning_rate": 8.08400884303611e-06,
"loss": 0.4365,
"step": 10200
},
{
"epoch": 1.6,
"grad_norm": 1.1208500862121582,
"learning_rate": 8.047162859248342e-06,
"loss": 0.4426,
"step": 10300
},
{
"epoch": 1.62,
"grad_norm": 1.1885343790054321,
"learning_rate": 8.010316875460575e-06,
"loss": 0.4414,
"step": 10400
},
{
"epoch": 1.63,
"grad_norm": 0.9988847374916077,
"learning_rate": 7.973470891672809e-06,
"loss": 0.4368,
"step": 10500
},
{
"epoch": 1.65,
"grad_norm": 1.0181776285171509,
"learning_rate": 7.936624907885041e-06,
"loss": 0.4399,
"step": 10600
},
{
"epoch": 1.66,
"grad_norm": 0.9755112528800964,
"learning_rate": 7.899778924097274e-06,
"loss": 0.4424,
"step": 10700
},
{
"epoch": 1.68,
"grad_norm": 1.0309982299804688,
"learning_rate": 7.862932940309506e-06,
"loss": 0.4418,
"step": 10800
},
{
"epoch": 1.7,
"grad_norm": 1.032485842704773,
"learning_rate": 7.82608695652174e-06,
"loss": 0.4404,
"step": 10900
},
{
"epoch": 1.71,
"grad_norm": 1.0489119291305542,
"learning_rate": 7.789240972733973e-06,
"loss": 0.4333,
"step": 11000
},
{
"epoch": 1.73,
"grad_norm": 1.1144189834594727,
"learning_rate": 7.752394988946205e-06,
"loss": 0.4348,
"step": 11100
},
{
"epoch": 1.74,
"grad_norm": 1.0886303186416626,
"learning_rate": 7.715549005158437e-06,
"loss": 0.4355,
"step": 11200
},
{
"epoch": 1.76,
"grad_norm": 1.0355674028396606,
"learning_rate": 7.678703021370671e-06,
"loss": 0.4348,
"step": 11300
},
{
"epoch": 1.77,
"grad_norm": 1.0847941637039185,
"learning_rate": 7.641857037582904e-06,
"loss": 0.4398,
"step": 11400
},
{
"epoch": 1.79,
"grad_norm": 1.0430936813354492,
"learning_rate": 7.605011053795137e-06,
"loss": 0.4316,
"step": 11500
},
{
"epoch": 1.8,
"grad_norm": 1.0820753574371338,
"learning_rate": 7.5681650700073696e-06,
"loss": 0.431,
"step": 11600
},
{
"epoch": 1.82,
"grad_norm": 1.080796241760254,
"learning_rate": 7.531319086219603e-06,
"loss": 0.433,
"step": 11700
},
{
"epoch": 1.84,
"grad_norm": 1.01823091506958,
"learning_rate": 7.494473102431835e-06,
"loss": 0.435,
"step": 11800
},
{
"epoch": 1.85,
"grad_norm": 1.0962271690368652,
"learning_rate": 7.4576271186440685e-06,
"loss": 0.4347,
"step": 11900
},
{
"epoch": 1.87,
"grad_norm": 1.0160537958145142,
"learning_rate": 7.420781134856301e-06,
"loss": 0.4326,
"step": 12000
},
{
"epoch": 1.88,
"grad_norm": 0.9727386832237244,
"learning_rate": 7.383935151068534e-06,
"loss": 0.4329,
"step": 12100
},
{
"epoch": 1.9,
"grad_norm": 1.0276840925216675,
"learning_rate": 7.3470891672807666e-06,
"loss": 0.4295,
"step": 12200
},
{
"epoch": 1.91,
"grad_norm": 0.972959041595459,
"learning_rate": 7.310243183493e-06,
"loss": 0.4252,
"step": 12300
},
{
"epoch": 1.93,
"grad_norm": 1.0518746376037598,
"learning_rate": 7.273397199705232e-06,
"loss": 0.4299,
"step": 12400
},
{
"epoch": 1.94,
"grad_norm": 1.0917232036590576,
"learning_rate": 7.2365512159174655e-06,
"loss": 0.4326,
"step": 12500
},
{
"epoch": 1.96,
"grad_norm": 1.0608563423156738,
"learning_rate": 7.199705232129699e-06,
"loss": 0.4318,
"step": 12600
},
{
"epoch": 1.98,
"grad_norm": 1.0238438844680786,
"learning_rate": 7.162859248341931e-06,
"loss": 0.4298,
"step": 12700
},
{
"epoch": 1.99,
"grad_norm": 0.9618347883224487,
"learning_rate": 7.126013264554164e-06,
"loss": 0.4351,
"step": 12800
},
{
"epoch": 2.0,
"eval_loss": 0.47487419843673706,
"eval_runtime": 205.6987,
"eval_samples_per_second": 31.347,
"eval_steps_per_second": 3.918,
"step": 12857
}
],
"logging_steps": 100,
"max_steps": 32140,
"num_input_tokens_seen": 0,
"num_train_epochs": 5,
"save_steps": 500,
"total_flos": 6.404792140239882e+17,
"train_batch_size": 2,
"trial_name": null,
"trial_params": null
}

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:34584b3b2e57ec0863337589518428b1b74d4c21f340ee45e4119e80f6d3a4c8
size 4856

1
vocab.json Normal file

File diff suppressed because one or more lines are too long