Qwen3-8B-GRPO/README.md

---
license: apache-2.0
datasets:
- agentica-org/DeepScaleR-Preview-Dataset
language:
- en
base_model:
- Qwen/Qwen3-8B-Base
pipeline_tag: reinforcement-learning
---
### Description
This repository contains the model for [Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning](https://huggingface.co/papers/2510.03259).

### Official Implementation
https://github.com/akatigre/MASA-RL

### Citation
```bibtex
@article{kim2025meta,
  title={Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning},
  author={Kim, Yoonjeon and Jang, Doohyuk and Yang, Eunho},
  journal={arXiv preprint arXiv:2510.03259},
  year={2025}
}
```
初始化项目，由ModelHub XC社区提供模型 Model: jadohu/Qwen3-8B-GRPO Source: Original Platform 2026-04-20 13:15:48 +08:00			`---`
			`license: apache-2.0`
			`datasets:`
			`- agentica-org/DeepScaleR-Preview-Dataset`
			`language:`
			`- en`
			`base_model:`
			`- Qwen/Qwen3-8B-Base`
			`pipeline_tag: reinforcement-learning`
			`---`
			`### Description`
			`This repository contains the model for [Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning](https://huggingface.co/papers/2510.03259).`

			`### Official Implementation`
			`https://github.com/akatigre/MASA-RL`

			`### Citation`
			```bibtex
			`@article{kim2025meta,`
			`title={Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning},`
			`author={Kim, Yoonjeon and Jang, Doohyuk and Yang, Eunho},`
			`journal={arXiv preprint arXiv:2510.03259},`
			`year={2025}`
			`}`
			```