初始化项目，由ModelHub XC社区提供模型

Model: eth-nlped/TutorRL-7B Source: Original Platform
2026-05-10 06:09:17 +08:00
commit 7b20eb39ab
16 changed files with 152165 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,71 @@
+---
+library_name: transformers
+license: apache-2.0
+license_link: https://github.com/eth-lre/PedagogicalRL/blob/main/LICENSE
+pipeline_tag: text-generation
+base_model:
+- Qwen/Qwen2.5-7B-Instruct
+tags:
+- math-tutor
+- grpo
+datasets:
+- SynthLabsAI/Big-Math-RL-Verified
+---
+
+# TutorRL-7B
+
+## Overview
+
+**TutorRL-7B** is a fine-tuned variant of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), trained to act as a math **tutor** rather than a solver. It is aligned to pedagogical principles using **reinforcement learning (GRPO)** in a synthetic multi-turn classroom setting, without requiring any human-labeled data.
+
+This model was developed as part of the research project [*From Problem-Solving to Teaching Problem-Solving*](https://arxiv.org/abs/2505.15607), which proposes a scalable, annotation-free approach to training LLMs as **educational tutors**. Instead of directly answering questions, the model is optimized to scaffold reasoning, guide through Socratic questioning, and withhold final solutions when beneficial for learning.
+
+Repository: [https://github.com/eth-lre/PedagogicalRL](https://github.com/eth-lre/PedagogicalRL)
+
+## Intended Use
+
+This model is intended for use in:
+
+* Interactive math tutoring
+* Socratic dialogue generation
+* Research on educational alignment of LLMs
+* Safe and indirect teaching in problem-solving contexts
+
+## Example Usage
+
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+model_id = "eth-nlped/TutorRL-7B"
+
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
+
+messages = [
+    {"role": "user", "content": "Can you help me solve 3x + 5 = 20?"}
+]
+
+prompt = tokenizer.apply_chat_template(messages, tokenize=False)
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+
+outputs = model.generate(**inputs, max_new_tokens=512)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+
+> Note: This model does **not** generate `<think>` blocks. If you want planning-based reasoning, refer to this model variant: [TutorRL-7B-think](https://huggingface.co/eth-nlped/TutorRL-7B-think)
+
+## Citation
+
+If you use this model or build upon the training framework, please cite:
+
+```
+@misc{dinucujianu2025problemsolvingteachingproblemsolvingaligning,
+  title={From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning},
+  author={David Dinucu-Jianu and Jakub Macina and Nico Daheim and Ido Hakimi and Iryna Gurevych and Mrinmaya Sachan},
+  year={2025},
+  eprint={2505.15607},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL},
+  url={https://arxiv.org/abs/2505.15607}
+}
+```