70 lines
2.3 KiB
Markdown
70 lines
2.3 KiB
Markdown
|
|
|
||
|
|
---
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
license: mit
|
||
|
|
base_model: gpt2
|
||
|
|
tags:
|
||
|
|
- text-generation
|
||
|
|
- gpt2
|
||
|
|
- cosmopedia
|
||
|
|
- educational
|
||
|
|
- synthetic-data
|
||
|
|
model_name: CosmoGPT2-Mini
|
||
|
|
datasets:
|
||
|
|
- Dhiraj45/cosmopedia-v2
|
||
|
|
metrics:
|
||
|
|
- loss
|
||
|
|
---
|
||
|
|
|
||
|
|
# CosmoGPT2-Mini 🚀
|
||
|
|
|
||
|
|
## Description
|
||
|
|
**CosmoGPT2-Mini** is a fine-tuned version of the classic [GPT-2](https://huggingface.co/gpt2) model. It has been trained on a subset of the **Cosmopedia v2** dataset, which consists of synthetic textbooks, blog posts, and educational content.
|
||
|
|
|
||
|
|
The goal of this model is to adapt GPT-2's capabilities to generate more informative and educational-style text compared to the base model.
|
||
|
|
|
||
|
|
## Model Details
|
||
|
|
- **Developed by:** [younes MA]
|
||
|
|
- **Model type:** Causal Language Model
|
||
|
|
- **Base Model:** GPT-2 (Small)
|
||
|
|
- **Language:** English
|
||
|
|
- **Training Precision:** `bfloat16` (optimized for stability and speed)
|
||
|
|
|
||
|
|
## Training Data
|
||
|
|
The model was trained on **30,000 samples** from the `Dhiraj45/cosmopedia-v2` dataset. This dataset is known for its high-quality synthetic data covering various academic and general knowledge topics.
|
||
|
|
|
||
|
|
## Training Hyperparameters
|
||
|
|
- **Epochs:** 1
|
||
|
|
- **Max Steps:** 1000
|
||
|
|
- **Batch Size:** 2 (with Gradient Accumulation Steps: 8)
|
||
|
|
- **Learning Rate:** 5e-5
|
||
|
|
- **Optimizer:** AdamW (fused)
|
||
|
|
- **Precision:** `bf16`
|
||
|
|
- **Max Sequence Length:** 512 tokens
|
||
|
|
|
||
|
|
## How to use
|
||
|
|
You can use this model directly with a pipeline for text generation:
|
||
|
|
|
||
|
|
```python
|
||
|
|
from transformers import pipeline
|
||
|
|
|
||
|
|
generator = pipeline("text-generation", model="iko-01//CosmoGPT2-Mini")
|
||
|
|
prompt = "The concept of gravity can be explained as"
|
||
|
|
result = generator(prompt, max_length=100, num_return_sequences=1)
|
||
|
|
|
||
|
|
print(result[0]['generated_text'])
|
||
|
|
```
|
||
|
|
|
||
|
|
## Intended Use & Limitations
|
||
|
|
- **Intended Use:** Experimental purposes, educational text generation, and studying fine-tuning on synthetic data.
|
||
|
|
- **Limitations:** Since this is a small version (GPT-2) and trained on a limited subset (30k samples), it may still generate hallucinations or repetitive text. It is not intended for production-level academic advice.
|
||
|
|
|
||
|
|
## Training Results
|
||
|
|
The model was trained on a T4 GPU (or equivalent) using optimized settings.
|
||
|
|
- **Final Training Loss:** [2.837890]
|
||
|
|
- **Evaluation Loss:** [2.686130]
|
||
|
|
|
||
|
|
---
|
||
|
|
**Note:** This model is part of a training experiment using the Cosmopedia dataset.
|
||
|
|
```
|