初始化项目,由ModelHub XC社区提供模型
Model: whooray/Qwen2.5-1.5B-Open-R1-Distill-ko Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
75
README.md
Normal file
75
README.md
Normal file
@@ -0,0 +1,75 @@
|
||||
---
|
||||
base_model: Qwen/Qwen2.5-1.5B-Instruct
|
||||
datasets: lemon-mint/korean-reasoning-v02
|
||||
library_name: transformers
|
||||
model_name: Qwen2.5-1.5B-Open-R1-Distill-ko
|
||||
tags:
|
||||
- generated_from_trainer
|
||||
- open-r1
|
||||
- trl
|
||||
- sft
|
||||
licence: license
|
||||
language:
|
||||
- zho
|
||||
- eng
|
||||
- fra
|
||||
- spa
|
||||
- por
|
||||
- deu
|
||||
- ita
|
||||
- rus
|
||||
- jpn
|
||||
- kor
|
||||
- vie
|
||||
- tha
|
||||
- ara
|
||||
---
|
||||
|
||||
# Model Card for Qwen2.5-1.5B-Open-R1-Distill-ko
|
||||
|
||||
This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) on the [lemon-mint/korean-reasoning-v02](https://huggingface.co/datasets/lemon-mint/korean-reasoning-v02) dataset.
|
||||
It has been trained using [TRL](https://github.com/huggingface/trl).
|
||||
|
||||
## Quick start
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
|
||||
question = "프랑스의 수도는?"
|
||||
generator = pipeline("text-generation", model="whooray/Qwen2.5-1.5B-Open-R1-Distill-ko", device="cuda")
|
||||
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
|
||||
print(output["generated_text"])
|
||||
<think>\n먼저 프랑스 수도를 알아봐야겠어요. 프랑스는 잘 알려진 유럽 국가 중 하나인데 수도를 알려주면 더 편하겠죠. 주요 귀족과 수도는 오스만 식민지와 무역의 중심지였던 아름디라는 지역에 있었다는 말이 있던데, 최근에는 오타냐 근처에 있는 파리가 수도로 통일되었을 거예요. 이렇게 논란이 있었던 걸로 기억하는데, 독일이나 이집트, 이스라엘 같은 국가들도 수도로 다른 지역을 사용했던 경우가 있었음을 알고 있어요. 프랑스는 CIA 월스트리트 journal에서 프랑스 수도는 파리가 아닐 거라고 말한 적 있을까요? 최근 대부분의 관측나라들이 파리로 인정하고 있으므로 정확한지 확인이 필요할 것 같아요. 하지만 프랑스에서 수도가 변했는지, 아니면 옛 수도가 현재 정부 인근에 있는지 궁금하네요. 아마도 1962년 제5공화국에서령으로 습격된 곳을 포함한 모든 정부 기관이 파리를 중심으로 하는 교두보 역할을 하게 된 걸로 알고 있어요. 특별히 역사적으로 정당한 주장을 사용해 개통 가능한 답변을 완성해야겠는데요. \n</think>\n\n프랑스의 수도는 **파리(Paris)**예요. 역사적으로 수도 역할을 하지 못했던 지점에 위치한 프랑스 비국민대책 정부를 중심으로 한 정권이 1944년에 백제온을 탈환하여 새로운 수도로 지정하며 최종적으로 확립되었어요.\n\n### 프랑스 수도 교체의 주요 이유 \n당시 연합군의 인도주의 의지를 반영한 조치였어요. 1932년 17개 연합군 단체가 파리 기지를 공유하면서 공식적인 수도 기능을 잃었다는 점에서 '특허 수도'로 검열되며 미국, 일본 등 유럽 국가들 중 파리를 중심으로 한 정부 주도의 통치가 유려되어 표준화되었죠. \n> **비유**: 프랑스 수도는
|
||||
```
|
||||
|
||||
## Training procedure
|
||||
|
||||
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/llm-est/huggingface/runs/mrisof65)
|
||||
|
||||
|
||||
This model was trained with SFT.
|
||||
|
||||
### Framework versions
|
||||
|
||||
- TRL: 0.15.0.dev0
|
||||
- Transformers: 4.49.0.dev0
|
||||
- Pytorch: 2.5.1
|
||||
- Datasets: 3.2.0
|
||||
- Tokenizers: 0.21.0
|
||||
|
||||
## Citations
|
||||
|
||||
|
||||
|
||||
Cite TRL as:
|
||||
|
||||
```bibtex
|
||||
@misc{vonwerra2022trl,
|
||||
title = {{TRL: Transformer Reinforcement Learning}},
|
||||
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
|
||||
year = 2020,
|
||||
journal = {GitHub repository},
|
||||
publisher = {GitHub},
|
||||
howpublished = {\url{https://github.com/huggingface/trl}}
|
||||
}
|
||||
```
|
||||
24
added_tokens.json
Normal file
24
added_tokens.json
Normal file
@@ -0,0 +1,24 @@
|
||||
{
|
||||
"</tool_call>": 151658,
|
||||
"<tool_call>": 151657,
|
||||
"<|box_end|>": 151649,
|
||||
"<|box_start|>": 151648,
|
||||
"<|endoftext|>": 151643,
|
||||
"<|file_sep|>": 151664,
|
||||
"<|fim_middle|>": 151660,
|
||||
"<|fim_pad|>": 151662,
|
||||
"<|fim_prefix|>": 151659,
|
||||
"<|fim_suffix|>": 151661,
|
||||
"<|im_end|>": 151645,
|
||||
"<|im_start|>": 151644,
|
||||
"<|image_pad|>": 151655,
|
||||
"<|object_ref_end|>": 151647,
|
||||
"<|object_ref_start|>": 151646,
|
||||
"<|quad_end|>": 151651,
|
||||
"<|quad_start|>": 151650,
|
||||
"<|repo_name|>": 151663,
|
||||
"<|video_pad|>": 151656,
|
||||
"<|vision_end|>": 151653,
|
||||
"<|vision_pad|>": 151654,
|
||||
"<|vision_start|>": 151652
|
||||
}
|
||||
8
all_results.json
Normal file
8
all_results.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"total_flos": 30026049257472.0,
|
||||
"train_loss": 1.1239717241489526,
|
||||
"train_runtime": 525.5519,
|
||||
"train_samples": 31253,
|
||||
"train_samples_per_second": 16.251,
|
||||
"train_steps_per_second": 0.126
|
||||
}
|
||||
29
config.json
Normal file
29
config.json
Normal file
@@ -0,0 +1,29 @@
|
||||
{
|
||||
"_name_or_path": "Qwen/Qwen2.5-1.5B-Instruct",
|
||||
"architectures": [
|
||||
"Qwen2ForCausalLM"
|
||||
],
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 151643,
|
||||
"eos_token_id": 151645,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 1536,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 8960,
|
||||
"max_position_embeddings": 32768,
|
||||
"max_window_layers": 21,
|
||||
"model_type": "qwen2",
|
||||
"num_attention_heads": 12,
|
||||
"num_hidden_layers": 28,
|
||||
"num_key_value_heads": 2,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 1000000.0,
|
||||
"sliding_window": null,
|
||||
"tie_word_embeddings": true,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.49.0.dev0",
|
||||
"use_cache": true,
|
||||
"use_sliding_window": false,
|
||||
"vocab_size": 151936
|
||||
}
|
||||
14
generation_config.json
Normal file
14
generation_config.json
Normal file
@@ -0,0 +1,14 @@
|
||||
{
|
||||
"bos_token_id": 151643,
|
||||
"do_sample": true,
|
||||
"eos_token_id": [
|
||||
151645,
|
||||
151643
|
||||
],
|
||||
"pad_token_id": 151643,
|
||||
"repetition_penalty": 1.1,
|
||||
"temperature": 0.7,
|
||||
"top_k": 20,
|
||||
"top_p": 0.8,
|
||||
"transformers_version": "4.49.0.dev0"
|
||||
}
|
||||
151388
merges.txt
Normal file
151388
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:3d5dca6c25ea0d32ebf8a1d078cee1c742307cff522a48e09cedef57293b158f
|
||||
size 3087467144
|
||||
25
special_tokens_map.json
Normal file
25
special_tokens_map.json
Normal file
@@ -0,0 +1,25 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"eos_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": "<|im_end|>"
|
||||
}
|
||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
|
||||
size 11421896
|
||||
208
tokenizer_config.json
Normal file
208
tokenizer_config.json
Normal file
@@ -0,0 +1,208 @@
|
||||
{
|
||||
"add_bos_token": false,
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"151643": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151644": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151645": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151646": {
|
||||
"content": "<|object_ref_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151647": {
|
||||
"content": "<|object_ref_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151648": {
|
||||
"content": "<|box_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151649": {
|
||||
"content": "<|box_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151650": {
|
||||
"content": "<|quad_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151651": {
|
||||
"content": "<|quad_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151652": {
|
||||
"content": "<|vision_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151653": {
|
||||
"content": "<|vision_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151654": {
|
||||
"content": "<|vision_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151655": {
|
||||
"content": "<|image_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151656": {
|
||||
"content": "<|video_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151657": {
|
||||
"content": "<tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151658": {
|
||||
"content": "</tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151659": {
|
||||
"content": "<|fim_prefix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151660": {
|
||||
"content": "<|fim_middle|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151661": {
|
||||
"content": "<|fim_suffix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151662": {
|
||||
"content": "<|fim_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151663": {
|
||||
"content": "<|repo_name|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151664": {
|
||||
"content": "<|file_sep|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"bos_token": null,
|
||||
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|im_end|>",
|
||||
"errors": "replace",
|
||||
"extra_special_tokens": {},
|
||||
"model_max_length": 131072,
|
||||
"pad_token": "<|im_end|>",
|
||||
"split_special_tokens": false,
|
||||
"tokenizer_class": "Qwen2Tokenizer",
|
||||
"unk_token": null
|
||||
}
|
||||
8
train_results.json
Normal file
8
train_results.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"total_flos": 30026049257472.0,
|
||||
"train_loss": 1.1239717241489526,
|
||||
"train_runtime": 525.5519,
|
||||
"train_samples": 31253,
|
||||
"train_samples_per_second": 16.251,
|
||||
"train_steps_per_second": 0.126
|
||||
}
|
||||
147
trainer_state.json
Normal file
147
trainer_state.json
Normal file
@@ -0,0 +1,147 @@
|
||||
{
|
||||
"best_metric": null,
|
||||
"best_model_checkpoint": null,
|
||||
"epoch": 0.9887640449438202,
|
||||
"eval_steps": 100,
|
||||
"global_step": 66,
|
||||
"is_hyper_param_search": false,
|
||||
"is_local_process_zero": true,
|
||||
"is_world_process_zero": true,
|
||||
"log_history": [
|
||||
{
|
||||
"epoch": 0.0749063670411985,
|
||||
"grad_norm": 1.5548935152564476,
|
||||
"learning_rate": 1.4285714285714287e-05,
|
||||
"loss": 1.4458,
|
||||
"mean_token_accuracy": 0.6562816568352667,
|
||||
"step": 5
|
||||
},
|
||||
{
|
||||
"epoch": 0.149812734082397,
|
||||
"grad_norm": 1.8830484605836868,
|
||||
"learning_rate": 1.9872683547213446e-05,
|
||||
"loss": 1.3445,
|
||||
"mean_token_accuracy": 0.6695685059143874,
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"epoch": 0.2247191011235955,
|
||||
"grad_norm": 0.8654100123580895,
|
||||
"learning_rate": 1.9106347728549134e-05,
|
||||
"loss": 1.226,
|
||||
"mean_token_accuracy": 0.6923269238251538,
|
||||
"step": 15
|
||||
},
|
||||
{
|
||||
"epoch": 0.299625468164794,
|
||||
"grad_norm": 0.5943096378692382,
|
||||
"learning_rate": 1.7698339834299064e-05,
|
||||
"loss": 1.1571,
|
||||
"mean_token_accuracy": 0.7064256188642666,
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"epoch": 0.37453183520599254,
|
||||
"grad_norm": 0.4676209085255989,
|
||||
"learning_rate": 1.5747874102144073e-05,
|
||||
"loss": 1.1288,
|
||||
"mean_token_accuracy": 0.711906644971702,
|
||||
"step": 25
|
||||
},
|
||||
{
|
||||
"epoch": 0.449438202247191,
|
||||
"grad_norm": 0.4382156327540379,
|
||||
"learning_rate": 1.3392388661180303e-05,
|
||||
"loss": 1.0824,
|
||||
"mean_token_accuracy": 0.720971662772769,
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"epoch": 0.5243445692883895,
|
||||
"grad_norm": 0.3820951165479697,
|
||||
"learning_rate": 1.0797861055530832e-05,
|
||||
"loss": 1.0568,
|
||||
"mean_token_accuracy": 0.7265031930562844,
|
||||
"step": 35
|
||||
},
|
||||
{
|
||||
"epoch": 0.599250936329588,
|
||||
"grad_norm": 0.3610180887247636,
|
||||
"learning_rate": 8.147112759128859e-06,
|
||||
"loss": 1.0555,
|
||||
"mean_token_accuracy": 0.726724873033217,
|
||||
"step": 40
|
||||
},
|
||||
{
|
||||
"epoch": 0.6741573033707865,
|
||||
"grad_norm": 0.3461477286333746,
|
||||
"learning_rate": 5.626926795411447e-06,
|
||||
"loss": 1.031,
|
||||
"mean_token_accuracy": 0.7318156111871186,
|
||||
"step": 45
|
||||
},
|
||||
{
|
||||
"epoch": 0.7490636704119851,
|
||||
"grad_norm": 0.33255510793828263,
|
||||
"learning_rate": 3.414886209349615e-06,
|
||||
"loss": 1.0255,
|
||||
"mean_token_accuracy": 0.7329844427728794,
|
||||
"step": 50
|
||||
},
|
||||
{
|
||||
"epoch": 0.8239700374531835,
|
||||
"grad_norm": 0.31534489370179297,
|
||||
"learning_rate": 1.6668608091748495e-06,
|
||||
"loss": 1.0237,
|
||||
"mean_token_accuracy": 0.7332181534901535,
|
||||
"step": 55
|
||||
},
|
||||
{
|
||||
"epoch": 0.898876404494382,
|
||||
"grad_norm": 0.3119780254657091,
|
||||
"learning_rate": 5.060239153161872e-07,
|
||||
"loss": 1.024,
|
||||
"mean_token_accuracy": 0.7329648884764794,
|
||||
"step": 60
|
||||
},
|
||||
{
|
||||
"epoch": 0.9737827715355806,
|
||||
"grad_norm": 0.3095384399368516,
|
||||
"learning_rate": 1.4173043232380557e-08,
|
||||
"loss": 1.0252,
|
||||
"mean_token_accuracy": 0.7327120227338457,
|
||||
"step": 65
|
||||
},
|
||||
{
|
||||
"epoch": 0.9887640449438202,
|
||||
"mean_token_accuracy": 0.7354095790887372,
|
||||
"step": 66,
|
||||
"total_flos": 30026049257472.0,
|
||||
"train_loss": 1.1239717241489526,
|
||||
"train_runtime": 525.5519,
|
||||
"train_samples_per_second": 16.251,
|
||||
"train_steps_per_second": 0.126
|
||||
}
|
||||
],
|
||||
"logging_steps": 5,
|
||||
"max_steps": 66,
|
||||
"num_input_tokens_seen": 0,
|
||||
"num_train_epochs": 1,
|
||||
"save_steps": 500,
|
||||
"stateful_callbacks": {
|
||||
"TrainerControl": {
|
||||
"args": {
|
||||
"should_epoch_stop": false,
|
||||
"should_evaluate": false,
|
||||
"should_log": false,
|
||||
"should_save": false,
|
||||
"should_training_stop": false
|
||||
},
|
||||
"attributes": {}
|
||||
}
|
||||
},
|
||||
"total_flos": 30026049257472.0,
|
||||
"train_batch_size": 2,
|
||||
"trial_name": null,
|
||||
"trial_params": null
|
||||
}
|
||||
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0906b9c1a8215bdf41e7d0ed4af8e30f0d511ce7c7a89ae00beb66a4675e3653
|
||||
size 7352
|
||||
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user