初始化项目，由ModelHub XC社区提供模型

Model: aicinema69/gpt2-growing Source: Original Platform
2026-06-05 14:29:26 +08:00
commit 09d516ad27
7 changed files with 250528 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,125 @@
+---
+library_name: transformers
+base_model:
+- openai-community/gpt2
+---
+# Growing LLM Model Card
+
+## Model Description
+
+The **Growing LLM** is a GPT-2 based language model that implements neural plasticity-inspired dynamic growth during training. This model starts with a pre-trained GPT-2 (124M parameters) and dynamically adds new transformer blocks while freezing the original parameters, allowing the model to acquire new knowledge without catastrophic forgetting.
+
+### Key Features
+
+- **Dynamic Growth**: Adds new transformer blocks during training
+- **Knowledge Preservation**: Freezes original parameters to retain pre-trained knowledge
+- **Flexible Triggers**: Supports fixed schedule and plateau detection growth triggers
+- **Regularization Options**: Supports Knowledge Distillation and Elastic Weight Consolidation (EWC)
+- **Comprehensive Metrics**: Tracks training, validation, growth events, and scaling analysis
+
+## Training Details
+
+### Training Data
+- Dataset: WikiText-2-raw-v1
+- Max sequence length: 128 tokens
+
+### Training Configuration
+- Base model: GPT-2 (124M parameters)
+- Learning rate: 5e-5
+- Batch size: 8
+- Optimizer: AdamW with weight decay 0.01
+- Max steps: 2000
+- Growth frequency: Every 500 steps
+- Maximum growth events: 3
+
+### Growth Mechanism
+1. **Fixed Schedule**: Grow every N training steps
+2. **Plateau Detection**: Grow when validation loss shows no improvement for Y steps
+
+### Regularization (Optional)
+- **Knowledge Distillation**: Uses teacher-student architecture with temperature scaling
+- **Elastic Weight Consolidation (EWC)**: Penalizes changes to important parameters
+
+## Model Architecture
+
+- Base: GPT-2 (12 layers, 12 heads, 768 hidden dim)
+- Growth: Added 3 new transformer blocks (one per growth event)
+- Final: 15 layers, 145.7M total parameters
+
+## Training Results
+
+### Summary Metrics
+| Metric | Initial | Final |
+|--------|---------|-------|
+| Training Loss | 7.16 | 1.95 |
+| Validation Loss | 6.99 | 2.03 |
+| Validation Perplexity | ~1000 | 7.58 |
+| Total Parameters | 124.4M | 145.7M |
+
+### Training Time
+- Total time: ~60 minutes (3596 seconds)
+- Best validation loss: 2.00
+- Best validation perplexity: 7.42
+
+### Growth Events
+| Growth # | Step | Layers | Parameters Added | Val Loss Delta |
+|---------|------|--------|-----------------|----------------|
+| 1 | 500 | 12 → 13 | +7.1M | +0.00003 |
+| 2 | 1000 | 13 → 14 | +7.1M | +0.00002 |
+| 3 | 1500 | 14 → 15 | +7.1M | +0.000001 |
+
+
+### RESULTS SUMMARY
+
+| Model | Perplexity | Loss |
+|-------|------------|------|
+| Base GPT-2 | 56.39 | 4.0323 |
+| Growing LLM | 33.39 | 3.5082 |
+
+Perplexity improvement: 40.8%
+
+**Key Observation**: The validation loss delta after each growth event is minimal (~0.00003), demonstrating successful knowledge retention. The model continues to learn new capabilities without catastrophic forgetting.
+
+## Usage
+
+```python
+from transformers import GPT2LMHeadModel, AutoTokenizer
+
+# Load model and tokenizer
+model = GPT2LMHeadModel.from_pretrained("aicinema69/gpt2-growing")
+tokenizer = AutoTokenizer.from_pretrained("aicinema69/gpt2-growing")
+
+# Generate text
+input_text = "Once upon a time"
+inputs = tokenizer(input_text, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=50)
+print(tokenizer.decode(outputs[0]))
+```
+
+## Limitations
+
+- Growth events may cause temporary performance dips that recover with continued training
+- Requires sufficient training data to benefit from additional parameters
+- More parameters = higher memory and compute requirements
+
+## License
+
+This model is based on GPT-2 which has the [OpenAI GPT-2 License](https://github.com/openai/gpt-2/blob/master/LICENSE).
+
+## Citation
+
+If you use this model in your research, please cite:
+
+```bibtex
+@misc{growing_llm,
+  author = {Satyam Singh},
+  title = {Growing LLM: Dynamic Model Growth for Continual Learning},
+  year = {2026},
+  publisher = {HuggingFace},
+  howpublished = {\url{https://huggingface.co/aicinema69/gpt2-growing}}
+}
+```
+
+## Contact
+
+For questions or issues, please open a GitHub issue or contact the model author.