25 lines
671 B
Markdown
25 lines
671 B
Markdown
|
|
---
|
||
|
|
license: apache-2.0
|
||
|
|
datasets:
|
||
|
|
- agentica-org/DeepScaleR-Preview-Dataset
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
base_model:
|
||
|
|
- Qwen/Qwen3-8B-Base
|
||
|
|
pipeline_tag: reinforcement-learning
|
||
|
|
---
|
||
|
|
### Description
|
||
|
|
This repository contains the model for [Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning](https://huggingface.co/papers/2510.03259).
|
||
|
|
|
||
|
|
### Official Implementation
|
||
|
|
https://github.com/akatigre/MASA-RL
|
||
|
|
|
||
|
|
### Citation
|
||
|
|
```bibtex
|
||
|
|
@article{kim2025meta,
|
||
|
|
title={Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning},
|
||
|
|
author={Kim, Yoonjeon and Jang, Doohyuk and Yang, Eunho},
|
||
|
|
journal={arXiv preprint arXiv:2510.03259},
|
||
|
|
year={2025}
|
||
|
|
}
|
||
|
|
```
|