Files
ModelHub XC 7e63eabbd2 初始化项目,由ModelHub XC社区提供模型
Model: OctoThinker/OctoThinker-3B-Long-Base
Source: Original Platform
2026-05-01 05:02:26 +08:00

63 lines
1.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: llama3.2
datasets:
- OctoThinker/MegaMath-Web-Pro-Max
- LLM360/MegaMath
language:
- en
base_model:
- meta-llama/Llama-3.2-3B
pipeline_tag: text-generation
---
# [OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling](https://arxiv.org/abs/2506.20512)
## OctoThinker-3B-Long-Base
The OctoThinker family is built on carefully studied mid-training insights, starting from the Llama-3 family, to create a reinforcement learningfriendly base language model.
### Training Recipe
<div style="display: flex; justify-content: left; gap: 20px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/62cbeb2d72dfd24b86bdf977/fGkg_-5a2y8tI20025SOu.png" alt="Data Pipeline" style="width:90%;">
</div>
### Evaluation Results
Note that we adopt the few-shot prompting evaluation for these base language models.
<div style="display: flex; justify-content: left; gap: 20px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/62cbeb2d72dfd24b86bdf977/UCZ9MahRYqLY0iKjiWMqS.png" alt="Data Pipeline" style="width:80%;">
</div>
### More about OctoThinker
<div style="display: flex; justify-content: left; gap: 20px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/62cbeb2d72dfd24b86bdf977/bn85CEB_DW6azJ7KJp11Q.png" alt="Data Pipeline" style="width:100%;">
</div>
## Citation
Check out our [paper](https://arxiv.org/abs/2506.20512) for more details. If you use our models, datasets or find our work useful, please cite
```
@article{wang2025octothinker,
title={OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling},
author={Wang, Zengzhi and Zhou, Fan and Li, Xuefeng and Liu, Pengfei},
year={2025},
journal={arXiv preprint arXiv:2506.20512},
note={Preprint}
}
```