Files
Qwen3-8B-GRPO/README.md
ModelHub XC 82cd1a8b35 初始化项目,由ModelHub XC社区提供模型
Model: jadohu/Qwen3-8B-GRPO
Source: Original Platform
2026-04-20 13:15:48 +08:00

671 B

license, datasets, language, base_model, pipeline_tag
license datasets language base_model pipeline_tag
apache-2.0
agentica-org/DeepScaleR-Preview-Dataset
en
Qwen/Qwen3-8B-Base
reinforcement-learning

Description

This repository contains the model for Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning.

Official Implementation

https://github.com/akatigre/MASA-RL

Citation

@article{kim2025meta,
  title={Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning},
  author={Kim, Yoonjeon and Jang, Doohyuk and Yang, Eunho},
  journal={arXiv preprint arXiv:2510.03259},
  year={2025}
}