初始化项目，由ModelHub XC社区提供模型

Model: Jiangzs/MENTOR_Qwen_7B Source: Original Platform
2026-05-21 01:04:34 +08:00
commit 74984dda29
16 changed files with 152225 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,57 @@
+---
+license: apache-2.0
+tags:
+  - text-generation
+  - causal-lm
+  - reasoning
+library_name: transformers
+---
+
+
+
+# Introduction
+
+<div align="center"> 
+
+[![Paper](https://img.shields.io/badge/Paper-arXiv-b5212f.svg?logo=arxiv)](https://arxiv.org/abs/2510.04140)
+[![GitHub](https://img.shields.io/badge/Code-GitHub-181717.svg?logo=github)](https://github.com/Jiangzs1028/MENTOR)
+
+</div>
+
+MENTOR is a framework that enables LLMs to achieve effective and diverse exploration in reinforcement learning by providing expert guidance only at critical decision points, rather than imitating entire expert trajectories.
+
+## Key Highlights
+- **Selective Expert Guidance:** Injects expert signals only at critical decision points, avoiding full-trajectory imitation.  
+- **Effective & Diverse Exploration:** Balances expert guidance with autonomous exploration, preventing entropy collapse.  
+- **Absorb Essence, Remove Redundancy:** Captures essential expert strategies while discarding unnecessary patterns.  
+
+# Chat Template
+
+```python
+def build_MENTOR_chat_template(question, tokenizer):
+    system_prompt = (
+        "You are a helpful AI Assistant that provides well-reasoned and detailed responses. "
+        "You FIRST think about the reasoning process as an internal monologue and "
+        "then provide the final answer. The reasoning process MUST BE enclosed "
+        "within <think> </think> tags. The final answer MUST BE put in \\boxed{}."
+    )
+    return tokenizer.apply_chat_template(
+        [
+            {"role": "system", "content": system_prompt},
+            {"role": "user", "content": question}
+        ],
+        tokenize=False,
+        add_generation_prompt=True
+    )
+```
+
+# Citation
+If you find our model useful, please kindly cite our paper:
+```
+@article{jiang2025selective,
+  title={Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs},
+  author={Jiang, Zishang and Han, Jinyi and Li, Tingyun and Wang, Xinyi and Jiang, Sihang and Liang, Jiaqing and Dai, Zhaoqian and Ma, Shuguang and Yu, Fei and Xiao, Yanghua},
+  journal={arXiv preprint arXiv:2510.04140},
+  year={2025}
+}
+```