初始化项目,由ModelHub XC社区提供模型
Model: jadohu/Qwen3-8B-GRPO Source: Original Platform
This commit is contained in:
25
README.md
Normal file
25
README.md
Normal file
@@ -0,0 +1,25 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
datasets:
|
||||
- agentica-org/DeepScaleR-Preview-Dataset
|
||||
language:
|
||||
- en
|
||||
base_model:
|
||||
- Qwen/Qwen3-8B-Base
|
||||
pipeline_tag: reinforcement-learning
|
||||
---
|
||||
### Description
|
||||
This repository contains the model for [Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning](https://huggingface.co/papers/2510.03259).
|
||||
|
||||
### Official Implementation
|
||||
https://github.com/akatigre/MASA-RL
|
||||
|
||||
### Citation
|
||||
```bibtex
|
||||
@article{kim2025meta,
|
||||
title={Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning},
|
||||
author={Kim, Yoonjeon and Jang, Doohyuk and Yang, Eunho},
|
||||
journal={arXiv preprint arXiv:2510.03259},
|
||||
year={2025}
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user