初始化项目,由ModelHub XC社区提供模型
Model: iko-01/CosmoGPT2-Mini Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
70
README.md
Normal file
70
README.md
Normal file
@@ -0,0 +1,70 @@
|
||||
|
||||
---
|
||||
language:
|
||||
- en
|
||||
license: mit
|
||||
base_model: gpt2
|
||||
tags:
|
||||
- text-generation
|
||||
- gpt2
|
||||
- cosmopedia
|
||||
- educational
|
||||
- synthetic-data
|
||||
model_name: CosmoGPT2-Mini
|
||||
datasets:
|
||||
- Dhiraj45/cosmopedia-v2
|
||||
metrics:
|
||||
- loss
|
||||
---
|
||||
|
||||
# CosmoGPT2-Mini 🚀
|
||||
|
||||
## Description
|
||||
**CosmoGPT2-Mini** is a fine-tuned version of the classic [GPT-2](https://huggingface.co/gpt2) model. It has been trained on a subset of the **Cosmopedia v2** dataset, which consists of synthetic textbooks, blog posts, and educational content.
|
||||
|
||||
The goal of this model is to adapt GPT-2's capabilities to generate more informative and educational-style text compared to the base model.
|
||||
|
||||
## Model Details
|
||||
- **Developed by:** [younes MA]
|
||||
- **Model type:** Causal Language Model
|
||||
- **Base Model:** GPT-2 (Small)
|
||||
- **Language:** English
|
||||
- **Training Precision:** `bfloat16` (optimized for stability and speed)
|
||||
|
||||
## Training Data
|
||||
The model was trained on **30,000 samples** from the `Dhiraj45/cosmopedia-v2` dataset. This dataset is known for its high-quality synthetic data covering various academic and general knowledge topics.
|
||||
|
||||
## Training Hyperparameters
|
||||
- **Epochs:** 1
|
||||
- **Max Steps:** 1000
|
||||
- **Batch Size:** 2 (with Gradient Accumulation Steps: 8)
|
||||
- **Learning Rate:** 5e-5
|
||||
- **Optimizer:** AdamW (fused)
|
||||
- **Precision:** `bf16`
|
||||
- **Max Sequence Length:** 512 tokens
|
||||
|
||||
## How to use
|
||||
You can use this model directly with a pipeline for text generation:
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
|
||||
generator = pipeline("text-generation", model="iko-01//CosmoGPT2-Mini")
|
||||
prompt = "The concept of gravity can be explained as"
|
||||
result = generator(prompt, max_length=100, num_return_sequences=1)
|
||||
|
||||
print(result[0]['generated_text'])
|
||||
```
|
||||
|
||||
## Intended Use & Limitations
|
||||
- **Intended Use:** Experimental purposes, educational text generation, and studying fine-tuning on synthetic data.
|
||||
- **Limitations:** Since this is a small version (GPT-2) and trained on a limited subset (30k samples), it may still generate hallucinations or repetitive text. It is not intended for production-level academic advice.
|
||||
|
||||
## Training Results
|
||||
The model was trained on a T4 GPU (or equivalent) using optimized settings.
|
||||
- **Final Training Loss:** [2.837890]
|
||||
- **Evaluation Loss:** [2.686130]
|
||||
|
||||
---
|
||||
**Note:** This model is part of a training experiment using the Cosmopedia dataset.
|
||||
```
|
||||
41
config.json
Normal file
41
config.json
Normal file
@@ -0,0 +1,41 @@
|
||||
{
|
||||
"activation_function": "gelu_new",
|
||||
"add_cross_attention": false,
|
||||
"architectures": [
|
||||
"GPT2LMHeadModel"
|
||||
],
|
||||
"attn_pdrop": 0.1,
|
||||
"bos_token_id": 50256,
|
||||
"dtype": "bfloat16",
|
||||
"embd_pdrop": 0.1,
|
||||
"eos_token_id": 50256,
|
||||
"initializer_range": 0.02,
|
||||
"layer_norm_epsilon": 1e-05,
|
||||
"model_type": "gpt2",
|
||||
"n_ctx": 1024,
|
||||
"n_embd": 768,
|
||||
"n_head": 12,
|
||||
"n_inner": null,
|
||||
"n_layer": 12,
|
||||
"n_positions": 1024,
|
||||
"pad_token_id": null,
|
||||
"reorder_and_upcast_attn": false,
|
||||
"resid_pdrop": 0.1,
|
||||
"scale_attn_by_inverse_layer_idx": false,
|
||||
"scale_attn_weights": true,
|
||||
"summary_activation": null,
|
||||
"summary_first_dropout": 0.1,
|
||||
"summary_proj_to_labels": true,
|
||||
"summary_type": "cls_index",
|
||||
"summary_use_proj": true,
|
||||
"task_specific_params": {
|
||||
"text-generation": {
|
||||
"do_sample": true,
|
||||
"max_length": 50
|
||||
}
|
||||
},
|
||||
"tie_word_embeddings": true,
|
||||
"transformers_version": "5.0.0",
|
||||
"use_cache": false,
|
||||
"vocab_size": 50257
|
||||
}
|
||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 50256,
|
||||
"eos_token_id": 50256,
|
||||
"transformers_version": "5.0.0"
|
||||
}
|
||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c493c9afdf24d504ae73a01e1f66a6c70fad231a2ccc3a23f00448f69c66d7d4
|
||||
size 248894656
|
||||
3
optimizer.pt
Normal file
3
optimizer.pt
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:498858c6dc3fd7035f6eeb004f86cec42f09bdb3c1d85ae43e4bcfebf76deb58
|
||||
size 497885643
|
||||
3
rng_state.pth
Normal file
3
rng_state.pth
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:54d9a3585be330baac0fa76dc042507447958421f2bd245ba0cc3f3d33f6d724
|
||||
size 14645
|
||||
3
scheduler.pt
Normal file
3
scheduler.pt
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e8083cb28f9178670969e8954135f0609f0eac2102b31887abccd667da0fb9cf
|
||||
size 1465
|
||||
250320
tokenizer.json
Normal file
250320
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
12
tokenizer_config.json
Normal file
12
tokenizer_config.json
Normal file
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"add_prefix_space": false,
|
||||
"backend": "tokenizers",
|
||||
"bos_token": "<|endoftext|>",
|
||||
"eos_token": "<|endoftext|>",
|
||||
"errors": "replace",
|
||||
"is_local": false,
|
||||
"model_max_length": 1024,
|
||||
"pad_token": "<|endoftext|>",
|
||||
"tokenizer_class": "GPT2Tokenizer",
|
||||
"unk_token": "<|endoftext|>"
|
||||
}
|
||||
120
trainer_state.json
Normal file
120
trainer_state.json
Normal file
@@ -0,0 +1,120 @@
|
||||
{
|
||||
"best_global_step": null,
|
||||
"best_metric": null,
|
||||
"best_model_checkpoint": null,
|
||||
"epoch": 0.5614035087719298,
|
||||
"eval_steps": 500,
|
||||
"global_step": 1000,
|
||||
"is_hyper_param_search": false,
|
||||
"is_local_process_zero": true,
|
||||
"is_world_process_zero": true,
|
||||
"log_history": [
|
||||
{
|
||||
"epoch": 0.056140350877192984,
|
||||
"grad_norm": 3.546875,
|
||||
"learning_rate": 2.4750000000000002e-05,
|
||||
"loss": 3.1684613037109375,
|
||||
"step": 100
|
||||
},
|
||||
{
|
||||
"epoch": 0.11228070175438597,
|
||||
"grad_norm": 2.625,
|
||||
"learning_rate": 4.975e-05,
|
||||
"loss": 2.9828875732421873,
|
||||
"step": 200
|
||||
},
|
||||
{
|
||||
"epoch": 0.16842105263157894,
|
||||
"grad_norm": 2.75,
|
||||
"learning_rate": 4.38125e-05,
|
||||
"loss": 2.9039926147460937,
|
||||
"step": 300
|
||||
},
|
||||
{
|
||||
"epoch": 0.22456140350877193,
|
||||
"grad_norm": 2.703125,
|
||||
"learning_rate": 3.75625e-05,
|
||||
"loss": 2.862659912109375,
|
||||
"step": 400
|
||||
},
|
||||
{
|
||||
"epoch": 0.2807017543859649,
|
||||
"grad_norm": 2.46875,
|
||||
"learning_rate": 3.13125e-05,
|
||||
"loss": 2.8550177001953125,
|
||||
"step": 500
|
||||
},
|
||||
{
|
||||
"epoch": 0.2807017543859649,
|
||||
"eval_loss": 2.695997476577759,
|
||||
"eval_runtime": 112.3251,
|
||||
"eval_samples_per_second": 13.354,
|
||||
"eval_steps_per_second": 1.674,
|
||||
"step": 500
|
||||
},
|
||||
{
|
||||
"epoch": 0.3368421052631579,
|
||||
"grad_norm": 2.5625,
|
||||
"learning_rate": 2.50625e-05,
|
||||
"loss": 2.8371957397460936,
|
||||
"step": 600
|
||||
},
|
||||
{
|
||||
"epoch": 0.3929824561403509,
|
||||
"grad_norm": 2.453125,
|
||||
"learning_rate": 1.88125e-05,
|
||||
"loss": 2.83647705078125,
|
||||
"step": 700
|
||||
},
|
||||
{
|
||||
"epoch": 0.44912280701754387,
|
||||
"grad_norm": 2.546875,
|
||||
"learning_rate": 1.2562499999999999e-05,
|
||||
"loss": 2.8517510986328123,
|
||||
"step": 800
|
||||
},
|
||||
{
|
||||
"epoch": 0.5052631578947369,
|
||||
"grad_norm": 2.28125,
|
||||
"learning_rate": 6.3125e-06,
|
||||
"loss": 2.8428424072265623,
|
||||
"step": 900
|
||||
},
|
||||
{
|
||||
"epoch": 0.5614035087719298,
|
||||
"grad_norm": 2.421875,
|
||||
"learning_rate": 6.250000000000001e-08,
|
||||
"loss": 2.8378897094726563,
|
||||
"step": 1000
|
||||
},
|
||||
{
|
||||
"epoch": 0.5614035087719298,
|
||||
"eval_loss": 2.6861302852630615,
|
||||
"eval_runtime": 111.8875,
|
||||
"eval_samples_per_second": 13.406,
|
||||
"eval_steps_per_second": 1.68,
|
||||
"step": 1000
|
||||
}
|
||||
],
|
||||
"logging_steps": 100,
|
||||
"max_steps": 1000,
|
||||
"num_input_tokens_seen": 0,
|
||||
"num_train_epochs": 1,
|
||||
"save_steps": 500,
|
||||
"stateful_callbacks": {
|
||||
"TrainerControl": {
|
||||
"args": {
|
||||
"should_epoch_stop": false,
|
||||
"should_evaluate": false,
|
||||
"should_log": false,
|
||||
"should_save": true,
|
||||
"should_training_stop": true
|
||||
},
|
||||
"attributes": {}
|
||||
}
|
||||
},
|
||||
"total_flos": 4180672512000000.0,
|
||||
"train_batch_size": 2,
|
||||
"trial_name": null,
|
||||
"trial_params": null
|
||||
}
|
||||
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:730d0150783c2222a6815ae0702bb5fa06d9cdde68d131f8f95848bb25435f9b
|
||||
size 5201
|
||||
Reference in New Issue
Block a user