ModelHub XC 74984dda29 初始化项目,由ModelHub XC社区提供模型
Model: Jiangzs/MENTOR_Qwen_7B
Source: Original Platform
2026-05-21 01:04:34 +08:00

license, tags, library_name
license tags library_name
apache-2.0
text-generation
causal-lm
reasoning
transformers

Introduction

Paper GitHub

MENTOR is a framework that enables LLMs to achieve effective and diverse exploration in reinforcement learning by providing expert guidance only at critical decision points, rather than imitating entire expert trajectories.

Key Highlights

  • Selective Expert Guidance: Injects expert signals only at critical decision points, avoiding full-trajectory imitation.
  • Effective & Diverse Exploration: Balances expert guidance with autonomous exploration, preventing entropy collapse.
  • Absorb Essence, Remove Redundancy: Captures essential expert strategies while discarding unnecessary patterns.

Chat Template

def build_MENTOR_chat_template(question, tokenizer):
    system_prompt = (
        "You are a helpful AI Assistant that provides well-reasoned and detailed responses. "
        "You FIRST think about the reasoning process as an internal monologue and "
        "then provide the final answer. The reasoning process MUST BE enclosed "
        "within <think> </think> tags. The final answer MUST BE put in \\boxed{}."
    )
    return tokenizer.apply_chat_template(
        [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": question}
        ],
        tokenize=False,
        add_generation_prompt=True
    )

Citation

If you find our model useful, please kindly cite our paper:

@article{jiang2025selective,
  title={Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs},
  author={Jiang, Zishang and Han, Jinyi and Li, Tingyun and Wang, Xinyi and Jiang, Sihang and Liang, Jiaqing and Dai, Zhaoqian and Ma, Shuguang and Yu, Fei and Xiao, Yanghua},
  journal={arXiv preprint arXiv:2510.04140},
  year={2025}
}
Description
Model synced from source: Jiangzs/MENTOR_Qwen_7B
Readme 2 MiB
Languages
Jinja 100%