75 lines
2.3 KiB
Markdown
75 lines
2.3 KiB
Markdown
---
|
|
language:
|
|
- en
|
|
license: apache-2.0
|
|
tags:
|
|
- math
|
|
- reasoning
|
|
- sft
|
|
- instruction-tuning
|
|
- llama
|
|
- pytorch
|
|
- text-generation
|
|
datasets:
|
|
- HuggingFaceFW/fineweb-edu
|
|
- open-web-math/open-web-math
|
|
- hoskinson-center/proof-pile-v2
|
|
- TIGER-Lab/MathInstruct
|
|
- meta-math/MetaMathQA
|
|
- gsm8k
|
|
metrics:
|
|
- accuracy
|
|
---
|
|
|
|
# TinyMathReason-1B-sft
|
|
|
|
TinyMathReason-1B-sft is a 1.12 Billion parameter Llama-style decoder-only transformer trained from scratch specifically for mathematical reasoning. This is the **Supervised Fine-Tuned (SFT)** variant.
|
|
|
|
## Model Description
|
|
|
|
- **Developed by:** Himanshu Nakrani
|
|
- **Model type:** Decoder-only Transformer
|
|
- **Language(s):** English, Mathematics, Code
|
|
- **License:** Apache 2.0
|
|
- **Architecture:** 22 layers, 2048 hidden dimension, 16 Attention heads, 4 KV heads (GQA), SwiGLU activation (5632 intermediate dim).
|
|
- **Parameters:** 1.12B total
|
|
- **Context Length:** 4096 tokens
|
|
|
|
## Training Details
|
|
|
|
### Pretraining (Base Model)
|
|
The base model was trained from a random initialization on Google Cloud TPU v4-32 using the [MaxText](https://github.com/google/maxtext) framework.
|
|
- **Tokens:** ~300 Billion
|
|
- **Optimizer:** AdamW (β1=0.9, β2=0.95, weight_decay=0.1)
|
|
- **Learning Rate:** 3e-4 peak, cosine decay to 3e-5
|
|
|
|
### Supervised Fine-Tuning (SFT)
|
|
This variant was trained on ~600k instruction-following mathematical examples formatted in ChatML.
|
|
- **Hardware:** 1x A100 GPU using PyTorch + TRL
|
|
- **Learning Rate:** 2e-5 (Cosine schedule)
|
|
- **Epochs:** 2
|
|
|
|
## Intended Uses & Limitations
|
|
|
|
**Intended Uses:**
|
|
- Solving step-by-step grade-school to high-school level math problems.
|
|
- Educational assistance and logic-based chain-of-thought generation.
|
|
- As a foundation for further preference optimization (e.g., DPO, GRPO).
|
|
|
|
**Limitations:**
|
|
- Being a 1B parameter model, it lacks the broad general knowledge of larger models.
|
|
- Prone to arithmetic hallucination on very large numbers.
|
|
- May fail on complex topology or advanced undergraduate mathematics.
|
|
|
|
## Citation
|
|
|
|
```bibtex
|
|
@misc{tinymathreason2026,
|
|
author = {Himanshu Nakrani},
|
|
title = {TinyMathReason-1B: A 1 Billion Parameter Mathematical Reasoning LLM Built from Scratch on TPU v4-32},
|
|
year = {2026},
|
|
publisher = {GitHub},
|
|
howpublished = {\url{https://github.com/himanshu-nakrani/TinyMathReason-1B}}
|
|
}
|
|
```
|