31 lines
928 B
Markdown
31 lines
928 B
Markdown
|
|
---
|
||
|
|
language: en
|
||
|
|
license: mit
|
||
|
|
base_model: BikoRiko/GPT-2.4-High-Pro
|
||
|
|
tags:
|
||
|
|
- gpt2
|
||
|
|
- math
|
||
|
|
- fine-tuned
|
||
|
|
- Pro
|
||
|
|
- Math
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
---
|
||
|
|
|
||
|
|
# GPT-2.5-Math
|
||
|
|
|
||
|
|
GPT-2.5-Math is an upgraded version of **BikoRiko/GPT-2.4-High-Pro**, featuring an expanded architecture and specialized fine-tuning on mathematical reasoning.
|
||
|
|
|
||
|
|
## Model Details
|
||
|
|
- **Architecture:** GPT-2 with 6 additional layers (Total parameters ~0.2B).
|
||
|
|
- **Training Hardware:** NVIDIA H100 (via Modal.com).
|
||
|
|
- **Dataset:** 5% subset of `microsoft/orca-math-word-problems-200k`.
|
||
|
|
- **Objective:** Fine-tuned to solve math word problems and logical queries.
|
||
|
|
|
||
|
|
## Performance
|
||
|
|
The model is trained for mathematical reasoning. While it is a 0.2B parameter model, it demonstrates the beginning of logical grounding for basic word problems.
|
||
|
|
|
||
|
|
## Training Details
|
||
|
|
- **Optimizer:** AdamW
|
||
|
|
- **Precision:** Mixed Precision (torch.amp)
|
||
|
|
- **Epochs:** 3
|
||
|
|
- **Learning Rate:** 5e-5
|