82 lines
2.7 KiB
Markdown
82 lines
2.7 KiB
Markdown
|
|
---
|
||
|
|
license: mit
|
||
|
|
base_model: distilbert/distilgpt2
|
||
|
|
tags:
|
||
|
|
- fine-tuned
|
||
|
|
- eli5
|
||
|
|
- explain-like-im-5
|
||
|
|
- education
|
||
|
|
- simple-explanations
|
||
|
|
- distilgpt2
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
---
|
||
|
|
|
||
|
|
# DistilGPT2-MyBrainHurts (Full Fine-tune)
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
A **fully fine-tuned** version of DistilGPT2 (82M parameters) specialized in explaining
|
||
|
|
complex topics in simple, child-friendly language ("Explain Like I'm 5" style).
|
||
|
|
Unlike LoRA adapters, ALL model weights have been updated during training, making this
|
||
|
|
a completely specialized model.
|
||
|
|
|
||
|
|
## Key Features
|
||
|
|
- **Ultra-small**: Only ~312 MB total
|
||
|
|
- **Specialized**: All 82M parameters tuned for simple explanations
|
||
|
|
- **25 topics**: Trained on science, nature, technology, and everyday phenomena
|
||
|
|
- **Child-friendly**: Uses analogies and simple vocabulary
|
||
|
|
|
||
|
|
## Topics Covered
|
||
|
|
Gravity, Internet, Sky color, Photosynthesis, Electricity, Dinosaurs, Moon, Rain,
|
||
|
|
Sleep, Magnets, Clouds, Leaf colors, Volcanoes, Oceans, Airplanes, Robots, Seasons,
|
||
|
|
Sound, Stars, Computers, DNA, Bacteria, Rainbows, Ice cream melting, Thunder & Lightning
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
```python
|
||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||
|
|
|
||
|
|
model = AutoModelForCausalLM.from_pretrained("Ringkvist/DistilGPT2-MyBrainHurts")
|
||
|
|
tokenizer = AutoTokenizer.from_pretrained("Ringkvist/DistilGPT2-MyBrainHurts")
|
||
|
|
|
||
|
|
prompt = "Explain black holes like I'm 5:"
|
||
|
|
inputs = tokenizer(prompt, return_tensors="pt")
|
||
|
|
outputs = model.generate(
|
||
|
|
**inputs,
|
||
|
|
max_new_tokens=150,
|
||
|
|
temperature=0.7,
|
||
|
|
top_p=0.9,
|
||
|
|
repetition_penalty=1.2,
|
||
|
|
)
|
||
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||
|
|
```
|
||
|
|
|
||
|
|
## Training Details
|
||
|
|
- **Method**: Full fine-tuning (all parameters)
|
||
|
|
- **Base model**: [distilbert/distilgpt2](https://huggingface.co/distilbert/distilgpt2) (82M params)
|
||
|
|
- **Dataset**: 25 hand-crafted ELI5 explanations
|
||
|
|
- **Epochs**: 20
|
||
|
|
- **Learning rate**: 5e-5 with cosine schedule
|
||
|
|
- **Batch size**: 2 (x4 gradient accumulation = effective 8)
|
||
|
|
- **Hardware**: Apple Silicon Mac (CPU/MPS)
|
||
|
|
|
||
|
|
## Full Fine-tune vs LoRA
|
||
|
|
| Aspect | Full Fine-tune | LoRA |
|
||
|
|
|--------|---------------|------|
|
||
|
|
| Modified params | ALL (82M) | ~0.5% |
|
||
|
|
| Upload size | Full model (~312 MB) | Small adapter (~1-2 MB) |
|
||
|
|
| Base model needed | No | Yes |
|
||
|
|
| Specialization | Deeper | Surface-level |
|
||
|
|
| Training time | Longer | Shorter |
|
||
|
|
| Risk of forgetting | Higher | Lower |
|
||
|
|
|
||
|
|
## Limitations
|
||
|
|
- Small model (82M params) limits output quality
|
||
|
|
- Trained on limited examples - may not generalize to all topics
|
||
|
|
- Full fine-tuning means some base capabilities may be reduced (catastrophic forgetting)
|
||
|
|
- Best used as a demonstration/educational project
|
||
|
|
|
||
|
|
## Base Model
|
||
|
|
- [distilbert/distilgpt2](https://huggingface.co/distilbert/distilgpt2) - 82M parameter distilled GPT-2
|