初始化项目,由ModelHub XC社区提供模型

Model: iko-01/CosmoGPT2-Mini
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-01 15:12:29 +08:00
commit c5cad29193
12 changed files with 250619 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

70
README.md Normal file
View File

@@ -0,0 +1,70 @@
---
language:
- en
license: mit
base_model: gpt2
tags:
- text-generation
- gpt2
- cosmopedia
- educational
- synthetic-data
model_name: CosmoGPT2-Mini
datasets:
- Dhiraj45/cosmopedia-v2
metrics:
- loss
---
# CosmoGPT2-Mini 🚀
## Description
**CosmoGPT2-Mini** is a fine-tuned version of the classic [GPT-2](https://huggingface.co/gpt2) model. It has been trained on a subset of the **Cosmopedia v2** dataset, which consists of synthetic textbooks, blog posts, and educational content.
The goal of this model is to adapt GPT-2's capabilities to generate more informative and educational-style text compared to the base model.
## Model Details
- **Developed by:** [younes MA]
- **Model type:** Causal Language Model
- **Base Model:** GPT-2 (Small)
- **Language:** English
- **Training Precision:** `bfloat16` (optimized for stability and speed)
## Training Data
The model was trained on **30,000 samples** from the `Dhiraj45/cosmopedia-v2` dataset. This dataset is known for its high-quality synthetic data covering various academic and general knowledge topics.
## Training Hyperparameters
- **Epochs:** 1
- **Max Steps:** 1000
- **Batch Size:** 2 (with Gradient Accumulation Steps: 8)
- **Learning Rate:** 5e-5
- **Optimizer:** AdamW (fused)
- **Precision:** `bf16`
- **Max Sequence Length:** 512 tokens
## How to use
You can use this model directly with a pipeline for text generation:
```python
from transformers import pipeline
generator = pipeline("text-generation", model="iko-01//CosmoGPT2-Mini")
prompt = "The concept of gravity can be explained as"
result = generator(prompt, max_length=100, num_return_sequences=1)
print(result[0]['generated_text'])
```
## Intended Use & Limitations
- **Intended Use:** Experimental purposes, educational text generation, and studying fine-tuning on synthetic data.
- **Limitations:** Since this is a small version (GPT-2) and trained on a limited subset (30k samples), it may still generate hallucinations or repetitive text. It is not intended for production-level academic advice.
## Training Results
The model was trained on a T4 GPU (or equivalent) using optimized settings.
- **Final Training Loss:** [2.837890]
- **Evaluation Loss:** [2.686130]
---
**Note:** This model is part of a training experiment using the Cosmopedia dataset.
```

41
config.json Normal file
View File

@@ -0,0 +1,41 @@
{
"activation_function": "gelu_new",
"add_cross_attention": false,
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"dtype": "bfloat16",
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 768,
"n_head": 12,
"n_inner": null,
"n_layer": 12,
"n_positions": 1024,
"pad_token_id": null,
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50
}
},
"tie_word_embeddings": true,
"transformers_version": "5.0.0",
"use_cache": false,
"vocab_size": 50257
}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 50256,
"eos_token_id": 50256,
"transformers_version": "5.0.0"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c493c9afdf24d504ae73a01e1f66a6c70fad231a2ccc3a23f00448f69c66d7d4
size 248894656

3
optimizer.pt Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:498858c6dc3fd7035f6eeb004f86cec42f09bdb3c1d85ae43e4bcfebf76deb58
size 497885643

3
rng_state.pth Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:54d9a3585be330baac0fa76dc042507447958421f2bd245ba0cc3f3d33f6d724
size 14645

3
scheduler.pt Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e8083cb28f9178670969e8954135f0609f0eac2102b31887abccd667da0fb9cf
size 1465

250320
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

12
tokenizer_config.json Normal file
View File

@@ -0,0 +1,12 @@
{
"add_prefix_space": false,
"backend": "tokenizers",
"bos_token": "<|endoftext|>",
"eos_token": "<|endoftext|>",
"errors": "replace",
"is_local": false,
"model_max_length": 1024,
"pad_token": "<|endoftext|>",
"tokenizer_class": "GPT2Tokenizer",
"unk_token": "<|endoftext|>"
}

120
trainer_state.json Normal file
View File

@@ -0,0 +1,120 @@
{
"best_global_step": null,
"best_metric": null,
"best_model_checkpoint": null,
"epoch": 0.5614035087719298,
"eval_steps": 500,
"global_step": 1000,
"is_hyper_param_search": false,
"is_local_process_zero": true,
"is_world_process_zero": true,
"log_history": [
{
"epoch": 0.056140350877192984,
"grad_norm": 3.546875,
"learning_rate": 2.4750000000000002e-05,
"loss": 3.1684613037109375,
"step": 100
},
{
"epoch": 0.11228070175438597,
"grad_norm": 2.625,
"learning_rate": 4.975e-05,
"loss": 2.9828875732421873,
"step": 200
},
{
"epoch": 0.16842105263157894,
"grad_norm": 2.75,
"learning_rate": 4.38125e-05,
"loss": 2.9039926147460937,
"step": 300
},
{
"epoch": 0.22456140350877193,
"grad_norm": 2.703125,
"learning_rate": 3.75625e-05,
"loss": 2.862659912109375,
"step": 400
},
{
"epoch": 0.2807017543859649,
"grad_norm": 2.46875,
"learning_rate": 3.13125e-05,
"loss": 2.8550177001953125,
"step": 500
},
{
"epoch": 0.2807017543859649,
"eval_loss": 2.695997476577759,
"eval_runtime": 112.3251,
"eval_samples_per_second": 13.354,
"eval_steps_per_second": 1.674,
"step": 500
},
{
"epoch": 0.3368421052631579,
"grad_norm": 2.5625,
"learning_rate": 2.50625e-05,
"loss": 2.8371957397460936,
"step": 600
},
{
"epoch": 0.3929824561403509,
"grad_norm": 2.453125,
"learning_rate": 1.88125e-05,
"loss": 2.83647705078125,
"step": 700
},
{
"epoch": 0.44912280701754387,
"grad_norm": 2.546875,
"learning_rate": 1.2562499999999999e-05,
"loss": 2.8517510986328123,
"step": 800
},
{
"epoch": 0.5052631578947369,
"grad_norm": 2.28125,
"learning_rate": 6.3125e-06,
"loss": 2.8428424072265623,
"step": 900
},
{
"epoch": 0.5614035087719298,
"grad_norm": 2.421875,
"learning_rate": 6.250000000000001e-08,
"loss": 2.8378897094726563,
"step": 1000
},
{
"epoch": 0.5614035087719298,
"eval_loss": 2.6861302852630615,
"eval_runtime": 111.8875,
"eval_samples_per_second": 13.406,
"eval_steps_per_second": 1.68,
"step": 1000
}
],
"logging_steps": 100,
"max_steps": 1000,
"num_input_tokens_seen": 0,
"num_train_epochs": 1,
"save_steps": 500,
"stateful_callbacks": {
"TrainerControl": {
"args": {
"should_epoch_stop": false,
"should_evaluate": false,
"should_log": false,
"should_save": true,
"should_training_stop": true
},
"attributes": {}
}
},
"total_flos": 4180672512000000.0,
"train_batch_size": 2,
"trial_name": null,
"trial_params": null
}

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:730d0150783c2222a6815ae0702bb5fa06d9cdde68d131f8f95848bb25435f9b
size 5201