--- library_name: transformers license: apache-2.0 pipeline_tag: text-generation base_model: - Qwen/Qwen3-8B tags: - reward-model - rlhf - qwen3 --- # PaTaRM-8B [![arXiv](https://img.shields.io/badge/arXiv-2510.24235-b31b1b.svg)](https://arxiv.org/abs/2510.24235) [![GitHub](https://img.shields.io/badge/GitHub-PaTaRM-black?logo=github)](https://github.com/JaneEyre0530/PaTaRM) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) This is the **PaTaRM-8B** model, part of the PaTaRM series. For full details including overview, usage examples, training data, and citation, please refer to the main collection README: 👉 **[AIJian/PaTaRM — Main README](https://huggingface.co/AIJian/PaTaRM)** ## Models | Model | Base | Link | |-------|------|------| | PaTaRM-8B | Qwen3-8B | [AIJian/PaTaRM-8B](https://huggingface.co/AIJian/PaTaRM-8B) | | PaTaRM-14B | Qwen3-14B | [AIJian/PaTaRM-14B](https://huggingface.co/AIJian/PaTaRM-14B) | ## Citation ```bibtex @misc{jian2026patarmbridgingpairwisepointwise, title={PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward Modeling}, author={Ai Jian and Jingqing Ruan and Xing Ma and Dailin Li and Weipeng Zhang and Ke Zeng and Xunliang Cai}, year={2026}, eprint={2510.24235}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2510.24235}, } ```