Model: Jiangzs/MENTOR_Qwen_7B Source: Original Platform
license, tags, library_name
| license | tags | library_name | |||
|---|---|---|---|---|---|
| apache-2.0 |
|
transformers |
Introduction
MENTOR is a framework that enables LLMs to achieve effective and diverse exploration in reinforcement learning by providing expert guidance only at critical decision points, rather than imitating entire expert trajectories.
Key Highlights
- Selective Expert Guidance: Injects expert signals only at critical decision points, avoiding full-trajectory imitation.
- Effective & Diverse Exploration: Balances expert guidance with autonomous exploration, preventing entropy collapse.
- Absorb Essence, Remove Redundancy: Captures essential expert strategies while discarding unnecessary patterns.
Chat Template
def build_MENTOR_chat_template(question, tokenizer):
system_prompt = (
"You are a helpful AI Assistant that provides well-reasoned and detailed responses. "
"You FIRST think about the reasoning process as an internal monologue and "
"then provide the final answer. The reasoning process MUST BE enclosed "
"within <think> </think> tags. The final answer MUST BE put in \\boxed{}."
)
return tokenizer.apply_chat_template(
[
{"role": "system", "content": system_prompt},
{"role": "user", "content": question}
],
tokenize=False,
add_generation_prompt=True
)
Citation
If you find our model useful, please kindly cite our paper:
@article{jiang2025selective,
title={Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs},
author={Jiang, Zishang and Han, Jinyi and Li, Tingyun and Wang, Xinyi and Jiang, Sihang and Liang, Jiaqing and Dai, Zhaoqian and Ma, Shuguang and Yu, Fei and Xiao, Yanghua},
journal={arXiv preprint arXiv:2510.04140},
year={2025}
}
Description
Languages
Jinja
100%