初始化项目,由ModelHub XC社区提供模型
Model: iko-01/CosmoGPT2-Mini Source: Original Platform
This commit is contained in:
70
README.md
Normal file
70
README.md
Normal file
@@ -0,0 +1,70 @@
|
||||
|
||||
---
|
||||
language:
|
||||
- en
|
||||
license: mit
|
||||
base_model: gpt2
|
||||
tags:
|
||||
- text-generation
|
||||
- gpt2
|
||||
- cosmopedia
|
||||
- educational
|
||||
- synthetic-data
|
||||
model_name: CosmoGPT2-Mini
|
||||
datasets:
|
||||
- Dhiraj45/cosmopedia-v2
|
||||
metrics:
|
||||
- loss
|
||||
---
|
||||
|
||||
# CosmoGPT2-Mini 🚀
|
||||
|
||||
## Description
|
||||
**CosmoGPT2-Mini** is a fine-tuned version of the classic [GPT-2](https://huggingface.co/gpt2) model. It has been trained on a subset of the **Cosmopedia v2** dataset, which consists of synthetic textbooks, blog posts, and educational content.
|
||||
|
||||
The goal of this model is to adapt GPT-2's capabilities to generate more informative and educational-style text compared to the base model.
|
||||
|
||||
## Model Details
|
||||
- **Developed by:** [younes MA]
|
||||
- **Model type:** Causal Language Model
|
||||
- **Base Model:** GPT-2 (Small)
|
||||
- **Language:** English
|
||||
- **Training Precision:** `bfloat16` (optimized for stability and speed)
|
||||
|
||||
## Training Data
|
||||
The model was trained on **30,000 samples** from the `Dhiraj45/cosmopedia-v2` dataset. This dataset is known for its high-quality synthetic data covering various academic and general knowledge topics.
|
||||
|
||||
## Training Hyperparameters
|
||||
- **Epochs:** 1
|
||||
- **Max Steps:** 1000
|
||||
- **Batch Size:** 2 (with Gradient Accumulation Steps: 8)
|
||||
- **Learning Rate:** 5e-5
|
||||
- **Optimizer:** AdamW (fused)
|
||||
- **Precision:** `bf16`
|
||||
- **Max Sequence Length:** 512 tokens
|
||||
|
||||
## How to use
|
||||
You can use this model directly with a pipeline for text generation:
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
|
||||
generator = pipeline("text-generation", model="iko-01//CosmoGPT2-Mini")
|
||||
prompt = "The concept of gravity can be explained as"
|
||||
result = generator(prompt, max_length=100, num_return_sequences=1)
|
||||
|
||||
print(result[0]['generated_text'])
|
||||
```
|
||||
|
||||
## Intended Use & Limitations
|
||||
- **Intended Use:** Experimental purposes, educational text generation, and studying fine-tuning on synthetic data.
|
||||
- **Limitations:** Since this is a small version (GPT-2) and trained on a limited subset (30k samples), it may still generate hallucinations or repetitive text. It is not intended for production-level academic advice.
|
||||
|
||||
## Training Results
|
||||
The model was trained on a T4 GPU (or equivalent) using optimized settings.
|
||||
- **Final Training Loss:** [2.837890]
|
||||
- **Evaluation Loss:** [2.686130]
|
||||
|
||||
---
|
||||
**Note:** This model is part of a training experiment using the Cosmopedia dataset.
|
||||
```
|
||||
Reference in New Issue
Block a user