MENTOR_Qwen_7B/README.md

---
license: apache-2.0
tags:
  - text-generation
  - causal-lm
  - reasoning
library_name: transformers
---


# Introduction

<div align="center"> 

[![Paper](https://img.shields.io/badge/Paper-arXiv-b5212f.svg?logo=arxiv)](https://arxiv.org/abs/2510.04140)
[![GitHub](https://img.shields.io/badge/Code-GitHub-181717.svg?logo=github)](https://github.com/Jiangzs1028/MENTOR)

</div>

MENTOR is a framework that enables LLMs to achieve effective and diverse exploration in reinforcement learning by providing expert guidance only at critical decision points, rather than imitating entire expert trajectories.

## Key Highlights
- **Selective Expert Guidance:** Injects expert signals only at critical decision points, avoiding full-trajectory imitation.  
- **Effective & Diverse Exploration:** Balances expert guidance with autonomous exploration, preventing entropy collapse.  
- **Absorb Essence, Remove Redundancy:** Captures essential expert strategies while discarding unnecessary patterns.  

# Chat Template

```python
def build_MENTOR_chat_template(question, tokenizer):
    system_prompt = (
        "You are a helpful AI Assistant that provides well-reasoned and detailed responses. "
        "You FIRST think about the reasoning process as an internal monologue and "
        "then provide the final answer. The reasoning process MUST BE enclosed "
        "within <think> </think> tags. The final answer MUST BE put in \\boxed{}."
    )
    return tokenizer.apply_chat_template(
        [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": question}
        ],
        tokenize=False,
        add_generation_prompt=True
    )
```

# Citation
If you find our model useful, please kindly cite our paper:
```
@article{jiang2025selective,
  title={Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs},
  author={Jiang, Zishang and Han, Jinyi and Li, Tingyun and Wang, Xinyi and Jiang, Sihang and Liang, Jiaqing and Dai, Zhaoqian and Ma, Shuguang and Yu, Fei and Xiao, Yanghua},
  journal={arXiv preprint arXiv:2510.04140},
  year={2025}
}
```
初始化项目，由ModelHub XC社区提供模型 Model: Jiangzs/MENTOR_Qwen_7B Source: Original Platform 2026-05-21 01:04:34 +08:00			`---`
			`license: apache-2.0`
			`tags:`
			`- text-generation`
			`- causal-lm`
			`- reasoning`
			`library_name: transformers`
			`---`



			`# Introduction`

			`<div align="center">`

			`[![Paper](https://img.shields.io/badge/Paper-arXiv-b5212f.svg?logo=arxiv)](https://arxiv.org/abs/2510.04140)`
			`[![GitHub](https://img.shields.io/badge/Code-GitHub-181717.svg?logo=github)](https://github.com/Jiangzs1028/MENTOR)`

			`</div>`

			`MENTOR is a framework that enables LLMs to achieve effective and diverse exploration in reinforcement learning by providing expert guidance only at critical decision points, rather than imitating entire expert trajectories.`

			`## Key Highlights`
			`- Selective Expert Guidance: Injects expert signals only at critical decision points, avoiding full-trajectory imitation.`
			`- Effective & Diverse Exploration: Balances expert guidance with autonomous exploration, preventing entropy collapse.`
			`- Absorb Essence, Remove Redundancy: Captures essential expert strategies while discarding unnecessary patterns.`

			`# Chat Template`

			```python
			`def build_MENTOR_chat_template(question, tokenizer):`
			`system_prompt = (`
			`"You are a helpful AI Assistant that provides well-reasoned and detailed responses. "`
			`"You FIRST think about the reasoning process as an internal monologue and "`
			`"then provide the final answer. The reasoning process MUST BE enclosed "`
			`"within <think> </think> tags. The final answer MUST BE put in \\boxed{}."`
			`)`
			`return tokenizer.apply_chat_template(`
			`[`
			`{"role": "system", "content": system_prompt},`
			`{"role": "user", "content": question}`
			`],`
			`tokenize=False,`
			`add_generation_prompt=True`
			`)`
			```

			`# Citation`
			`If you find our model useful, please kindly cite our paper:`
			```
			`@article{jiang2025selective,`
			`title={Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs},`
			`author={Jiang, Zishang and Han, Jinyi and Li, Tingyun and Wang, Xinyi and Jiang, Sihang and Liang, Jiaqing and Dai, Zhaoqian and Ma, Shuguang and Yu, Fei and Xiao, Yanghua},`
			`journal={arXiv preprint arXiv:2510.04140},`
			`year={2025}`
			`}`
			```