初始化项目,由ModelHub XC社区提供模型

Model: Xtra-Computing/XtraGPT-1.5B
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-16 01:09:39 +08:00
commit 9710a601ef
18 changed files with 152087 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

188
README.md Normal file
View File

@@ -0,0 +1,188 @@
---
license: other
license_name: mg0-2.0
license_link: https://github.com/Xtra-Computing/ModelGo/blob/main/MG_licenses/V2/MG0-2.0.txt
language:
- zho
- eng
- fra
- spa
- por
- deu
- ita
- rus
- jpn
- kor
- vie
- tha
- ara
pipeline_tag: text-generation
base_model: Qwen/Qwen2.5-1.5B-Instruct
tags:
- chat
library_name: transformers
---
# XtraGPT: Context-Aware and Controllable Academic Paper Revision for Human-AI Collaboration
<p align="center">
<a href="https://arxiv.org/abs/2505.11336">
<img alt="arXiv" src="https://img.shields.io/badge/arXiv-2505.11336-b31b1b.svg">
</a>
</p>
## Model Overview
**XtraGPT** is a family of open-source Large Language Models (LLMs) designed specifically for **human-AI collaborative academic paper revision**. Unlike general-purpose models that often perform surface-level polishing, XtraGPT is fine-tuned to **understand the full context** of a research paper and execute specific, **criteria-guided** revision instructions.
The models were trained on a dataset of 140,000 high-quality instruction-revision pairs derived from top-tier conference papers (ICLR).
**Key Features:**
* **Context-Aware:** Processes the full paper context to ensure revisions maintain consistency with the global narrative.
* **Controllable:** Follows specific user instructions aligned with 20 academic writing criteria across 6 sections (Abstract, Introduction, etc.).
* **Iterative Workflow:** Designed to support the "Human-AI Collaborative" (HAC) lifecycle where authors retain creative control.
**Available Model Sizes:**
* **1.5B** (Based on Qwen/Qwen2.5-1.5B-Instruct)
* **3B** (Based on meta-llama/Llama-3.2-3B-Instruct)
* **7B** (Based on Qwen/Qwen2.5-7B-Instruct)
* **14B** (Based on microsoft/phi-4)
---
## Inference with Transformers
To use XtraGPT with the standard Hugging Face `transformers` library, ensure you format your input using the specific tags `<PAPER_CONTENT>`, `<SELECTED_CONTENT>`, and `<QUESTION>`.
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Select the model size: "XtraGPT-1.5B", "XtraGPT-3B", "XtraGPT-7B", or "XtraGPT-14B"
model_name = "Xtra-Computing/XtraGPT-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
# Define the Prompt Template tailored for XtraGPT
prompt_template = """Act as an expert model for improving articles **PAPER_CONTENT**.
The output needs to answer the **QUESTION** on **SELECTED_CONTENT** in the input. Avoid adding unnecessary length, unrelated details, overclaims, or vague statements.
Focus on clear, concise, and evidence-based improvements that align with the overall context of the paper.
<PAPER_CONTENT>
{paper_content}
</PAPER_CONTENT>
<SELECTED_CONTENT>
{selected_content}
</SELECTED_CONTENT>
<QUESTION>
{user_question}
</QUESTION>"""
# Example Data (from the "Attention Is All You Need" paper)
paper_content = "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train."
selected_content = "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration."
user_question = "help me make it more concise."
# Format the input
formatted_prompt = prompt_template.format(
paper_content=paper_content,
selected_content=selected_content,
user_question=user_question
)
messages = [
{"role": "user", "content": formatted_prompt}
]
# Apply chat template
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Generate
generated_ids = model.generate(
**model_inputs,
max_new_tokens=16384,
temperature=0.1
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
-----
## Inference with vLLM
XtraGPT is compatible with vLLM for high-throughput inference.
### 1\. Launch the Server
Replace `XtraGPT-14B` with your specific model variant.
```bash
python -m vllm.entrypoints.openai.api_server \
--port 8088 \
--model Xtra-Computing/XtraGPT-14B \
--served-model-name xtragpt \
--max-model-len 16384 \
--gpu-memory-utilization 0.95
```
### 2\. Send a Request (Client Side)
```bash
curl [http://127.0.0.1:8088/v1/chat/completions](http://127.0.0.1:8088/v1/chat/completions) \
-H "Content-Type: application/json" \
-d '{
"model": "xtragpt",
"messages": [
{
"role": "user",
"content": "Please improve the selected content based on the following. Act as an expert model for improving articles **PAPER_CONTENT**.\nThe output needs to answer the **QUESTION** on **SELECTED_CONTENT** in the input. Avoid adding unnecessary length, unrelated details, overclaims, or vague statements.\nFocus on clear, concise, and evidence-based improvements that align with the overall context of the paper.\n<PAPER_CONTENT>\nThe dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.\n</PAPER_CONTENT>\n<SELECTED_CONTENT>\nThe dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.\n</SELECTED_CONTENT>\n<QUESTION>\nhelp me make it more concise.\n</QUESTION>"
}
],
"temperature": 0.1,
"max_new_tokens": 16384,
"stream": false
}'
```
-----
## Model License
This model is released under the **ModelGo Zero License 2.0 (MG0-2.0)**.
MG0-2.0 is a highly permissive open model license designed to facilitate the widest possible adoption and collaboration. It allows for **unrestricted use**, reproduction, distribution, and the creation of derivative works including for commercial purposes, without requiring attribution or imposing copyleft restrictions.
For more details on the license terms, please visit [ModelGo.li](https://www.modelgo.li/) or refer to the `LICENSE` file in the repository.
-----
## Citation
If you use XtraGPT in your research, please cite our paper:
```
@inproceedings{
chen2026xtragpt,
title={XtraGPT: Context-Aware and Controllable Academic Paper Revision via Human-AI Collaboration},
author={Nuo Chen and Andre Lin HuiKai and Jiaying Wu and Junyi Hou and Zining Zhang and Qian Wang and Xidong Wang and Bingsheng He},
booktitle = "Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year={2026},
note={Available on arXiv:2505.11336}
}
```

24
added_tokens.json Normal file
View File

@@ -0,0 +1,24 @@
{
"</tool_call>": 151658,
"<tool_call>": 151657,
"<|box_end|>": 151649,
"<|box_start|>": 151648,
"<|endoftext|>": 151643,
"<|file_sep|>": 151664,
"<|fim_middle|>": 151660,
"<|fim_pad|>": 151662,
"<|fim_prefix|>": 151659,
"<|fim_suffix|>": 151661,
"<|im_end|>": 151645,
"<|im_start|>": 151644,
"<|image_pad|>": 151655,
"<|object_ref_end|>": 151647,
"<|object_ref_start|>": 151646,
"<|quad_end|>": 151651,
"<|quad_start|>": 151650,
"<|repo_name|>": 151663,
"<|video_pad|>": 151656,
"<|vision_end|>": 151653,
"<|vision_pad|>": 151654,
"<|vision_start|>": 151652
}

12
all_results.json Normal file
View File

@@ -0,0 +1,12 @@
{
"epoch": 1.9777777777777779,
"eval_loss": 0.5940256714820862,
"eval_runtime": 18.0133,
"eval_samples_per_second": 5.551,
"eval_steps_per_second": 1.388,
"total_flos": 15337468788736.0,
"train_loss": 0.5112001879939011,
"train_runtime": 1525.5061,
"train_samples_per_second": 1.18,
"train_steps_per_second": 0.073
}

29
config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"_name_or_path": "Qwen/Qwen2.5-1.5B-Instruct",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"max_position_embeddings": 32768,
"max_window_layers": 21,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.48.3",
"use_cache": false,
"use_sliding_window": false,
"vocab_size": 151936
}

7
eval_results.json Normal file
View File

@@ -0,0 +1,7 @@
{
"epoch": 1.9777777777777779,
"eval_loss": 0.5940256714820862,
"eval_runtime": 18.0133,
"eval_samples_per_second": 5.551,
"eval_steps_per_second": 1.388
}

14
generation_config.json Normal file
View File

@@ -0,0 +1,14 @@
{
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"repetition_penalty": 1.1,
"temperature": 0.7,
"top_k": 20,
"top_p": 0.8,
"transformers_version": "4.48.3"
}

151388
merges.txt Normal file

File diff suppressed because it is too large Load Diff

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fbb8d386714b1a34d2ca9cfbbd5f55ad56d34d7e924be4d6df07f2ccc05ade48
size 3087467144

31
special_tokens_map.json Normal file
View File

@@ -0,0 +1,31 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
size 11421896

209
tokenizer_config.json Normal file
View File

@@ -0,0 +1,209 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"extra_special_tokens": {},
"model_max_length": 16384,
"pad_token": "<|endoftext|>",
"padding_side": "right",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

8
train_results.json Normal file
View File

@@ -0,0 +1,8 @@
{
"epoch": 1.9777777777777779,
"total_flos": 15337468788736.0,
"train_loss": 0.5112001879939011,
"train_runtime": 1525.5061,
"train_samples_per_second": 1.18,
"train_steps_per_second": 0.073
}

12
trainer_log.jsonl Normal file
View File

@@ -0,0 +1,12 @@
{"current_steps": 10, "total_steps": 112, "loss": 0.7523, "lr": 8.333333333333334e-06, "epoch": 0.17777777777777778, "percentage": 8.93, "elapsed_time": "0:03:02", "remaining_time": "0:31:06"}
{"current_steps": 20, "total_steps": 112, "loss": 0.7075, "lr": 9.842915805643156e-06, "epoch": 0.35555555555555557, "percentage": 17.86, "elapsed_time": "0:06:05", "remaining_time": "0:28:02"}
{"current_steps": 30, "total_steps": 112, "loss": 0.6473, "lr": 9.221639627510076e-06, "epoch": 0.5333333333333333, "percentage": 26.79, "elapsed_time": "0:08:04", "remaining_time": "0:22:03"}
{"current_steps": 40, "total_steps": 112, "loss": 0.6078, "lr": 8.18711994874345e-06, "epoch": 0.7111111111111111, "percentage": 35.71, "elapsed_time": "0:10:08", "remaining_time": "0:18:14"}
{"current_steps": 50, "total_steps": 112, "loss": 0.5699, "lr": 6.840622763423391e-06, "epoch": 0.8888888888888888, "percentage": 44.64, "elapsed_time": "0:12:14", "remaining_time": "0:15:10"}
{"current_steps": 60, "total_steps": 112, "loss": 0.5587, "lr": 5.3139525976465675e-06, "epoch": 1.0533333333333332, "percentage": 53.57, "elapsed_time": "0:14:13", "remaining_time": "0:12:19"}
{"current_steps": 70, "total_steps": 112, "loss": 0.382, "lr": 3.756550564175727e-06, "epoch": 1.231111111111111, "percentage": 62.5, "elapsed_time": "0:16:14", "remaining_time": "0:09:44"}
{"current_steps": 80, "total_steps": 112, "loss": 0.3424, "lr": 2.320866025105016e-06, "epoch": 1.4088888888888889, "percentage": 71.43, "elapsed_time": "0:18:20", "remaining_time": "0:07:20"}
{"current_steps": 90, "total_steps": 112, "loss": 0.3815, "lr": 1.1474337861210543e-06, "epoch": 1.5866666666666667, "percentage": 80.36, "elapsed_time": "0:20:31", "remaining_time": "0:05:01"}
{"current_steps": 100, "total_steps": 112, "loss": 0.3416, "lr": 3.511175705587433e-07, "epoch": 1.7644444444444445, "percentage": 89.29, "elapsed_time": "0:22:33", "remaining_time": "0:02:42"}
{"current_steps": 110, "total_steps": 112, "loss": 0.3558, "lr": 9.866357858642206e-09, "epoch": 1.942222222222222, "percentage": 98.21, "elapsed_time": "0:24:47", "remaining_time": "0:00:27"}
{"current_steps": 112, "total_steps": 112, "epoch": 1.9777777777777779, "percentage": 100.0, "elapsed_time": "0:25:24", "remaining_time": "0:00:00"}

119
trainer_state.json Normal file
View File

@@ -0,0 +1,119 @@
{
"best_metric": null,
"best_model_checkpoint": null,
"epoch": 1.9777777777777779,
"eval_steps": 500,
"global_step": 112,
"is_hyper_param_search": false,
"is_local_process_zero": true,
"is_world_process_zero": true,
"log_history": [
{
"epoch": 0.17777777777777778,
"grad_norm": 1.1333891966940177,
"learning_rate": 8.333333333333334e-06,
"loss": 0.7523,
"step": 10
},
{
"epoch": 0.35555555555555557,
"grad_norm": 0.958010004815403,
"learning_rate": 9.842915805643156e-06,
"loss": 0.7075,
"step": 20
},
{
"epoch": 0.5333333333333333,
"grad_norm": 0.7853045645588168,
"learning_rate": 9.221639627510076e-06,
"loss": 0.6473,
"step": 30
},
{
"epoch": 0.7111111111111111,
"grad_norm": 0.8825698012004065,
"learning_rate": 8.18711994874345e-06,
"loss": 0.6078,
"step": 40
},
{
"epoch": 0.8888888888888888,
"grad_norm": 0.7151740400775302,
"learning_rate": 6.840622763423391e-06,
"loss": 0.5699,
"step": 50
},
{
"epoch": 1.0533333333333332,
"grad_norm": 0.7570773917821664,
"learning_rate": 5.3139525976465675e-06,
"loss": 0.5587,
"step": 60
},
{
"epoch": 1.231111111111111,
"grad_norm": 0.8230371816683825,
"learning_rate": 3.756550564175727e-06,
"loss": 0.382,
"step": 70
},
{
"epoch": 1.4088888888888889,
"grad_norm": 0.6621112845262395,
"learning_rate": 2.320866025105016e-06,
"loss": 0.3424,
"step": 80
},
{
"epoch": 1.5866666666666667,
"grad_norm": 0.8311851524519195,
"learning_rate": 1.1474337861210543e-06,
"loss": 0.3815,
"step": 90
},
{
"epoch": 1.7644444444444445,
"grad_norm": 0.7404603826818766,
"learning_rate": 3.511175705587433e-07,
"loss": 0.3416,
"step": 100
},
{
"epoch": 1.942222222222222,
"grad_norm": 0.7121753922595842,
"learning_rate": 9.866357858642206e-09,
"loss": 0.3558,
"step": 110
},
{
"epoch": 1.9777777777777779,
"step": 112,
"total_flos": 15337468788736.0,
"train_loss": 0.5112001879939011,
"train_runtime": 1525.5061,
"train_samples_per_second": 1.18,
"train_steps_per_second": 0.073
}
],
"logging_steps": 10,
"max_steps": 112,
"num_input_tokens_seen": 0,
"num_train_epochs": 2,
"save_steps": 500,
"stateful_callbacks": {
"TrainerControl": {
"args": {
"should_epoch_stop": false,
"should_evaluate": false,
"should_log": false,
"should_save": true,
"should_training_stop": true
},
"attributes": {}
}
},
"total_flos": 15337468788736.0,
"train_batch_size": 1,
"trial_name": null,
"trial_params": null
}

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:19ef446bce6ab5e2b1eac411b9c2c41cbf4ec56efe4dca405b4004a412b7ebfd
size 7416

BIN
training_loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

1
vocab.json Normal file

File diff suppressed because one or more lines are too long