88 lines
3.4 KiB
Markdown
88 lines
3.4 KiB
Markdown
|
|
---
|
|||
|
|
license: mit
|
|||
|
|
language:
|
|||
|
|
- en
|
|||
|
|
base_model:
|
|||
|
|
- Qwen/Qwen3-4B
|
|||
|
|
tags:
|
|||
|
|
- social-reasoning
|
|||
|
|
- Theory of Mind
|
|||
|
|
- reinforcement-learning
|
|||
|
|
- GRPO
|
|||
|
|
- SIP
|
|||
|
|
datasets:
|
|||
|
|
- Jincenzi/ToMBench_Hard
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# SocialR1-4B
|
|||
|
|
|
|||
|
|
**SocialR1-4B** is a social reasoning model built on [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B), trained with trajectory-level reinforcement learning (GRPO) using the **Social-R1** framework. It enhances social reasoning capabilities by aligning reasoning processes with the Social Information Processing (SIP) theory.
|
|||
|
|
|
|||
|
|
📄 **Paper**: [Social-R1: Enhancing Social Reasoning in LLMs through Trajectory-Level Reinforcement Learning](https://arxiv.org/abs/2603.09249)
|
|||
|
|
|
|||
|
|
## Highlights
|
|||
|
|
|
|||
|
|
- 🧠 **SIP-Guided Reasoning**: Enforces stage-consistent social inference — Cue Encoding → Cue Interpretation → Goal Clarification → Response Generation
|
|||
|
|
- 🎯 **Multi-Dimensional Reward**: Combines structural reward, content reward, inference efficiency, and format reward with curriculum-style weighting
|
|||
|
|
- 📊 **Strong Performance**: Enables a 4B-parameter model to match or outperform substantially larger baselines across static MCQ benchmarks, open-ended generation (FanToM), and interactive settings (SOTOPIA)
|
|||
|
|
|
|||
|
|
## Usage
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|||
|
|
|
|||
|
|
model_name = "Jincenzi/SocialR1-4B"
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
|
|||
|
|
|
|||
|
|
messages = [
|
|||
|
|
{"role": "user", "content": "You should first think about the reasoning process in the mind and then provide with the answer.The reasoning process and answer are enclosed within <think> </think> and <Answer> </Answer> tags, respectively."}
|
|||
|
|
]
|
|||
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
|||
|
|
inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
|||
|
|
outputs = model.generate(**inputs, max_new_tokens=2048)
|
|||
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Training Details
|
|||
|
|
|
|||
|
|
- **Base Model**: Qwen3-4B
|
|||
|
|
- **Training Method**: Group Relative Policy Optimization (GRPO)
|
|||
|
|
- **Training Steps**: 600
|
|||
|
|
- **Hardware**: 8× NVIDIA A100 (80GB)
|
|||
|
|
- **Group Size**: 5
|
|||
|
|
- **KL Coefficient**: 0.04
|
|||
|
|
- **Learning Rate**: 5×10⁻⁷
|
|||
|
|
- **Reward Design**: SIP structural reward ($R_\text{struct}$) + SIP content reward ($R_\text{cont}$) + inference efficiency ($R_\text{len}$) + format reward ($R_\text{fmt}$)
|
|||
|
|
|
|||
|
|
## Evaluation
|
|||
|
|
|
|||
|
|
SocialR1-4B is evaluated across three complementary settings:
|
|||
|
|
|
|||
|
|
- **Static MCQ**: ToMBench, ToMBench-Hard, SocialIQA, SimpleToM, EmoBench, MotiveBench, Hi-ToM, TactfulToM
|
|||
|
|
- **Open-ended Generation**: FanToM
|
|||
|
|
- **Interactive Social Intelligence**: SOTOPIA
|
|||
|
|
|
|||
|
|
## Related Resources
|
|||
|
|
|
|||
|
|
| Resource | Link |
|
|||
|
|
|----------|------|
|
|||
|
|
| Paper | [arXiv:2603.09249](https://arxiv.org/abs/2603.09249) |
|
|||
|
|
| SocialR1-8B | [Jincenzi/SocialR1-8B](https://huggingface.co/Jincenzi/SocialR1-8B) |
|
|||
|
|
|
|||
|
|
## Citation
|
|||
|
|
|
|||
|
|
```BibTeX
|
|||
|
|
@inproceedings{wu2026socialr1,
|
|||
|
|
title={Social-R1: Enhancing Social Reasoning in LLMs through Trajectory-Level Reinforcement Learning},
|
|||
|
|
author={Wu, Jincenzi and Lei, Yuxuan and Lian, Jianxun and Huang, Yitian and Zhou, Lexin and Li, Haotian and Yang, Deng and Xie, Xing and Meng, Helen},
|
|||
|
|
booktitle={Arxiv},
|
|||
|
|
year={2026}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Contact
|
|||
|
|
|
|||
|
|
For questions or discussions, please contact [jincenziwu@gmail.com](mailto:jincenziwu@gmail.com).
|