Files
SocialR1-4B/README.md
ModelHub XC 2b04d5f063 初始化项目,由ModelHub XC社区提供模型
Model: Jincenzi/SocialR1-4B
Source: Original Platform
2026-05-25 17:03:33 +08:00

88 lines
3.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: mit
language:
- en
base_model:
- Qwen/Qwen3-4B
tags:
- social-reasoning
- Theory of Mind
- reinforcement-learning
- GRPO
- SIP
datasets:
- Jincenzi/ToMBench_Hard
pipeline_tag: text-generation
---
# SocialR1-4B
**SocialR1-4B** is a social reasoning model built on [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B), trained with trajectory-level reinforcement learning (GRPO) using the **Social-R1** framework. It enhances social reasoning capabilities by aligning reasoning processes with the Social Information Processing (SIP) theory.
📄 **Paper**: [Social-R1: Enhancing Social Reasoning in LLMs through Trajectory-Level Reinforcement Learning](https://arxiv.org/abs/2603.09249)
## Highlights
- 🧠 **SIP-Guided Reasoning**: Enforces stage-consistent social inference — Cue Encoding → Cue Interpretation → Goal Clarification → Response Generation
- 🎯 **Multi-Dimensional Reward**: Combines structural reward, content reward, inference efficiency, and format reward with curriculum-style weighting
- 📊 **Strong Performance**: Enables a 4B-parameter model to match or outperform substantially larger baselines across static MCQ benchmarks, open-ended generation (FanToM), and interactive settings (SOTOPIA)
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Jincenzi/SocialR1-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
messages = [
{"role": "user", "content": "You should first think about the reasoning process in the mind and then provide with the answer.The reasoning process and answer are enclosed within <think> </think> and <Answer> </Answer> tags, respectively."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Training Details
- **Base Model**: Qwen3-4B
- **Training Method**: Group Relative Policy Optimization (GRPO)
- **Training Steps**: 600
- **Hardware**: 8× NVIDIA A100 (80GB)
- **Group Size**: 5
- **KL Coefficient**: 0.04
- **Learning Rate**: 5×10⁻⁷
- **Reward Design**: SIP structural reward ($R_\text{struct}$) + SIP content reward ($R_\text{cont}$) + inference efficiency ($R_\text{len}$) + format reward ($R_\text{fmt}$)
## Evaluation
SocialR1-4B is evaluated across three complementary settings:
- **Static MCQ**: ToMBench, ToMBench-Hard, SocialIQA, SimpleToM, EmoBench, MotiveBench, Hi-ToM, TactfulToM
- **Open-ended Generation**: FanToM
- **Interactive Social Intelligence**: SOTOPIA
## Related Resources
| Resource | Link |
|----------|------|
| Paper | [arXiv:2603.09249](https://arxiv.org/abs/2603.09249) |
| SocialR1-8B | [Jincenzi/SocialR1-8B](https://huggingface.co/Jincenzi/SocialR1-8B) |
## Citation
```BibTeX
@inproceedings{wu2026socialr1,
title={Social-R1: Enhancing Social Reasoning in LLMs through Trajectory-Level Reinforcement Learning},
author={Wu, Jincenzi and Lei, Yuxuan and Lian, Jianxun and Huang, Yitian and Zhou, Lexin and Li, Haotian and Yang, Deng and Xie, Xing and Meng, Helen},
booktitle={Arxiv},
year={2026}
}
```
## Contact
For questions or discussions, please contact [jincenziwu@gmail.com](mailto:jincenziwu@gmail.com).