--- library_name: transformers base_model: - openai-community/gpt2 --- # Growing LLM Model Card ## Model Description The **Growing LLM** is a GPT-2 based language model that implements neural plasticity-inspired dynamic growth during training. This model starts with a pre-trained GPT-2 (124M parameters) and dynamically adds new transformer blocks while freezing the original parameters, allowing the model to acquire new knowledge without catastrophic forgetting. ### Key Features - **Dynamic Growth**: Adds new transformer blocks during training - **Knowledge Preservation**: Freezes original parameters to retain pre-trained knowledge - **Flexible Triggers**: Supports fixed schedule and plateau detection growth triggers - **Regularization Options**: Supports Knowledge Distillation and Elastic Weight Consolidation (EWC) - **Comprehensive Metrics**: Tracks training, validation, growth events, and scaling analysis ## Training Details ### Training Data - Dataset: WikiText-2-raw-v1 - Max sequence length: 128 tokens ### Training Configuration - Base model: GPT-2 (124M parameters) - Learning rate: 5e-5 - Batch size: 8 - Optimizer: AdamW with weight decay 0.01 - Max steps: 2000 - Growth frequency: Every 500 steps - Maximum growth events: 3 ### Growth Mechanism 1. **Fixed Schedule**: Grow every N training steps 2. **Plateau Detection**: Grow when validation loss shows no improvement for Y steps ### Regularization (Optional) - **Knowledge Distillation**: Uses teacher-student architecture with temperature scaling - **Elastic Weight Consolidation (EWC)**: Penalizes changes to important parameters ## Model Architecture - Base: GPT-2 (12 layers, 12 heads, 768 hidden dim) - Growth: Added 3 new transformer blocks (one per growth event) - Final: 15 layers, 145.7M total parameters ## Training Results ### Summary Metrics | Metric | Initial | Final | |--------|---------|-------| | Training Loss | 7.16 | 1.95 | | Validation Loss | 6.99 | 2.03 | | Validation Perplexity | ~1000 | 7.58 | | Total Parameters | 124.4M | 145.7M | ### Training Time - Total time: ~60 minutes (3596 seconds) - Best validation loss: 2.00 - Best validation perplexity: 7.42 ### Growth Events | Growth # | Step | Layers | Parameters Added | Val Loss Delta | |---------|------|--------|-----------------|----------------| | 1 | 500 | 12 → 13 | +7.1M | +0.00003 | | 2 | 1000 | 13 → 14 | +7.1M | +0.00002 | | 3 | 1500 | 14 → 15 | +7.1M | +0.000001 | ### RESULTS SUMMARY | Model | Perplexity | Loss | |-------|------------|------| | Base GPT-2 | 56.39 | 4.0323 | | Growing LLM | 33.39 | 3.5082 | Perplexity improvement: 40.8% **Key Observation**: The validation loss delta after each growth event is minimal (~0.00003), demonstrating successful knowledge retention. The model continues to learn new capabilities without catastrophic forgetting. ## Usage ```python from transformers import GPT2LMHeadModel, AutoTokenizer # Load model and tokenizer model = GPT2LMHeadModel.from_pretrained("aicinema69/gpt2-growing") tokenizer = AutoTokenizer.from_pretrained("aicinema69/gpt2-growing") # Generate text input_text = "Once upon a time" inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=50) print(tokenizer.decode(outputs[0])) ``` ## Limitations - Growth events may cause temporary performance dips that recover with continued training - Requires sufficient training data to benefit from additional parameters - More parameters = higher memory and compute requirements ## License This model is based on GPT-2 which has the [OpenAI GPT-2 License](https://github.com/openai/gpt-2/blob/master/LICENSE). ## Citation If you use this model in your research, please cite: ```bibtex @misc{growing_llm, author = {Satyam Singh}, title = {Growing LLM: Dynamic Model Growth for Continual Learning}, year = {2026}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/aicinema69/gpt2-growing}} } ``` ## Contact For questions or issues, please open a GitHub issue or contact the model author.