初始化项目，由ModelHub XC社区提供模型

Model: himanshunakrani9/TinyMathReason-1B-sft Source: Original Platform
2026-05-30 06:33:20 +08:00
commit f77c65551f
127 changed files with 1778427 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,74 @@
+---
+language:
+- en
+license: apache-2.0
+tags:
+- math
+- reasoning
+- sft
+- instruction-tuning
+- llama
+- pytorch
+- text-generation
+datasets:
+- HuggingFaceFW/fineweb-edu
+- open-web-math/open-web-math
+- hoskinson-center/proof-pile-v2
+- TIGER-Lab/MathInstruct
+- meta-math/MetaMathQA
+- gsm8k
+metrics:
+- accuracy
+---
+
+# TinyMathReason-1B-sft
+
+TinyMathReason-1B-sft is a 1.12 Billion parameter Llama-style decoder-only transformer trained from scratch specifically for mathematical reasoning. This is the **Supervised Fine-Tuned (SFT)** variant.
+
+## Model Description
+
+- **Developed by:** Himanshu Nakrani
+- **Model type:** Decoder-only Transformer
+- **Language(s):** English, Mathematics, Code
+- **License:** Apache 2.0
+- **Architecture:** 22 layers, 2048 hidden dimension, 16 Attention heads, 4 KV heads (GQA), SwiGLU activation (5632 intermediate dim).
+- **Parameters:** 1.12B total
+- **Context Length:** 4096 tokens
+
+## Training Details
+
+### Pretraining (Base Model)
+The base model was trained from a random initialization on Google Cloud TPU v4-32 using the [MaxText](https://github.com/google/maxtext) framework.
+- **Tokens:** ~300 Billion
+- **Optimizer:** AdamW (β1=0.9, β2=0.95, weight_decay=0.1)
+- **Learning Rate:** 3e-4 peak, cosine decay to 3e-5
+
+### Supervised Fine-Tuning (SFT)
+This variant was trained on ~600k instruction-following mathematical examples formatted in ChatML.
+- **Hardware:** 1x A100 GPU using PyTorch + TRL
+- **Learning Rate:** 2e-5 (Cosine schedule)
+- **Epochs:** 2
+
+## Intended Uses & Limitations
+
+**Intended Uses:**
+- Solving step-by-step grade-school to high-school level math problems.
+- Educational assistance and logic-based chain-of-thought generation.
+- As a foundation for further preference optimization (e.g., DPO, GRPO).
+
+**Limitations:**
+- Being a 1B parameter model, it lacks the broad general knowledge of larger models.
+- Prone to arithmetic hallucination on very large numbers.
+- May fail on complex topology or advanced undergraduate mathematics.
+
+## Citation
+
+```bibtex
+@misc{tinymathreason2026,
+  author = {Himanshu Nakrani},
+  title = {TinyMathReason-1B: A 1 Billion Parameter Mathematical Reasoning LLM Built from Scratch on TPU v4-32},
+  year = {2026},
+  publisher = {GitHub},
+  howpublished = {\url{https://github.com/himanshu-nakrani/TinyMathReason-1B}}
+}
+```