---
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
base_model:
- Qwen/Qwen3-8B
tags:
- reward-model
- rlhf
- qwen3
---

# PaTaRM-8B

[![arXiv](https://img.shields.io/badge/arXiv-2510.24235-b31b1b.svg)](https://arxiv.org/abs/2510.24235)
[![GitHub](https://img.shields.io/badge/GitHub-PaTaRM-black?logo=github)](https://github.com/JaneEyre0530/PaTaRM)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)

This is the **PaTaRM-8B** model, part of the PaTaRM series. For full details including overview, usage examples, training data, and citation, please refer to the main collection README:

👉 **[AIJian/PaTaRM — Main README](https://huggingface.co/AIJian/PaTaRM)**

## Models

| Model | Base | Link |
|-------|------|------|
| PaTaRM-8B | Qwen3-8B | [AIJian/PaTaRM-8B](https://huggingface.co/AIJian/PaTaRM-8B) |
| PaTaRM-14B | Qwen3-14B | [AIJian/PaTaRM-14B](https://huggingface.co/AIJian/PaTaRM-14B) |

## Citation

```bibtex
@misc{jian2026patarmbridgingpairwisepointwise,
      title={PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward Modeling}, 
      author={Ai Jian and Jingqing Ruan and Xing Ma and Dailin Li and Weipeng Zhang and Ke Zeng and Xunliang Cai},
      year={2026},
      eprint={2510.24235},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.24235}, 
}
```