Files
Qwen3-8B-GRPO/README.md
ModelHub XC 82cd1a8b35 初始化项目,由ModelHub XC社区提供模型
Model: jadohu/Qwen3-8B-GRPO
Source: Original Platform
2026-04-20 13:15:48 +08:00

25 lines
671 B
Markdown

---
license: apache-2.0
datasets:
- agentica-org/DeepScaleR-Preview-Dataset
language:
- en
base_model:
- Qwen/Qwen3-8B-Base
pipeline_tag: reinforcement-learning
---
### Description
This repository contains the model for [Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning](https://huggingface.co/papers/2510.03259).
### Official Implementation
https://github.com/akatigre/MASA-RL
### Citation
```bibtex
@article{kim2025meta,
title={Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning},
author={Kim, Yoonjeon and Jang, Doohyuk and Yang, Eunho},
journal={arXiv preprint arXiv:2510.03259},
year={2025}
}
```