TutorRL-7B/README.md

---
library_name: transformers
license: apache-2.0
license_link: https://github.com/eth-lre/PedagogicalRL/blob/main/LICENSE
pipeline_tag: text-generation
base_model:
- Qwen/Qwen2.5-7B-Instruct
tags:
- math-tutor
- grpo
datasets:
- SynthLabsAI/Big-Math-RL-Verified
---

# TutorRL-7B

## Overview

**TutorRL-7B** is a fine-tuned variant of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), trained to act as a math **tutor** rather than a solver. It is aligned to pedagogical principles using **reinforcement learning (GRPO)** in a synthetic multi-turn classroom setting, without requiring any human-labeled data.

This model was developed as part of the research project [*From Problem-Solving to Teaching Problem-Solving*](https://arxiv.org/abs/2505.15607), which proposes a scalable, annotation-free approach to training LLMs as **educational tutors**. Instead of directly answering questions, the model is optimized to scaffold reasoning, guide through Socratic questioning, and withhold final solutions when beneficial for learning.

Repository: [https://github.com/eth-lre/PedagogicalRL](https://github.com/eth-lre/PedagogicalRL)

## Intended Use

This model is intended for use in:

* Interactive math tutoring
* Socratic dialogue generation
* Research on educational alignment of LLMs
* Safe and indirect teaching in problem-solving contexts

## Example Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "eth-nlped/TutorRL-7B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

messages = [
    {"role": "user", "content": "Can you help me solve 3x + 5 = 20?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

> Note: This model does **not** generate `<think>` blocks. If you want planning-based reasoning, refer to this model variant: [TutorRL-7B-think](https://huggingface.co/eth-nlped/TutorRL-7B-think)

## Citation

If you use this model or build upon the training framework, please cite:

```
@misc{dinucujianu2025problemsolvingteachingproblemsolvingaligning,
  title={From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning},
  author={David Dinucu-Jianu and Jakub Macina and Nico Daheim and Ido Hakimi and Iryna Gurevych and Mrinmaya Sachan},
  year={2025},
  eprint={2505.15607},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2505.15607}
}
```
初始化项目，由ModelHub XC社区提供模型 Model: eth-nlped/TutorRL-7B Source: Original Platform 2026-05-10 06:09:17 +08:00			`---`
			`library_name: transformers`
			`license: apache-2.0`
			`license_link: https://github.com/eth-lre/PedagogicalRL/blob/main/LICENSE`
			`pipeline_tag: text-generation`
			`base_model:`
			`- Qwen/Qwen2.5-7B-Instruct`
			`tags:`
			`- math-tutor`
			`- grpo`
			`datasets:`
			`- SynthLabsAI/Big-Math-RL-Verified`
			`---`

			`# TutorRL-7B`

			`## Overview`

			`TutorRL-7B is a fine-tuned variant of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), trained to act as a math tutor rather than a solver. It is aligned to pedagogical principles using reinforcement learning (GRPO) in a synthetic multi-turn classroom setting, without requiring any human-labeled data.`

			`This model was developed as part of the research project [From Problem-Solving to Teaching Problem-Solving](https://arxiv.org/abs/2505.15607), which proposes a scalable, annotation-free approach to training LLMs as educational tutors. Instead of directly answering questions, the model is optimized to scaffold reasoning, guide through Socratic questioning, and withhold final solutions when beneficial for learning.`

			`Repository: [https://github.com/eth-lre/PedagogicalRL](https://github.com/eth-lre/PedagogicalRL)`

			`## Intended Use`

			`This model is intended for use in:`

			`* Interactive math tutoring`
			`* Socratic dialogue generation`
			`* Research on educational alignment of LLMs`
			`* Safe and indirect teaching in problem-solving contexts`

			`## Example Usage`

			```python
			`from transformers import AutoTokenizer, AutoModelForCausalLM`

			`model_id = "eth-nlped/TutorRL-7B"`

			`tokenizer = AutoTokenizer.from_pretrained(model_id)`
			`model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")`

			`messages = [`
			`{"role": "user", "content": "Can you help me solve 3x + 5 = 20?"}`
			`]`

			`prompt = tokenizer.apply_chat_template(messages, tokenize=False)`
			`inputs = tokenizer(prompt, return_tensors="pt").to(model.device)`

			`outputs = model.generate(**inputs, max_new_tokens=512)`
			`print(tokenizer.decode(outputs[0], skip_special_tokens=True))`
			```

			> Note: This model does not generate `<think>` blocks. If you want planning-based reasoning, refer to this model variant: [TutorRL-7B-think](https://huggingface.co/eth-nlped/TutorRL-7B-think)

			`## Citation`

			`If you use this model or build upon the training framework, please cite:`

			```
			`@misc{dinucujianu2025problemsolvingteachingproblemsolvingaligning,`
			`title={From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning},`
			`author={David Dinucu-Jianu and Jakub Macina and Nico Daheim and Ido Hakimi and Iryna Gurevych and Mrinmaya Sachan},`
			`year={2025},`
			`eprint={2505.15607},`
			`archivePrefix={arXiv},`
			`primaryClass={cs.CL},`
			`url={https://arxiv.org/abs/2505.15607}`
			`}`
			```