--- base_model: - Qwen/Qwen3-8B-Base datasets: - Elliott/Openr1-Math-46k-8192 language: - en license: apache-2.0 metrics: - accuracy pipeline_tag: text-generation library_name: transformers --- [![arXiv](https://img.shields.io/badge/arXiv-2510.26109-b31b1b.svg?logo=arxiv)](https://arxiv.org/abs/2510.26109) [![GitHub](https://img.shields.io/badge/GitHub-LTE-blue?logo=github)](https://github.com/JamyDon/LTE) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg?logo=apache)](LICENSE) ## Introduction LTE is an RLVR approach that mitigates the exploration stagnation of LMs by their previously self-made mistakes and does not require any external expert guidance. LTE improves the performance upper bound of LMs and enhances both exploitation and exploration during training. ## Key Highlights - **Self-generated Hints**: LTE uses the errors generated by the LMs themselves during training as hints. - **No External Expert Guidance**: LTE does not require any external expert guidance to mitigate the exploration stagnation of LMs. ## Inference Here is an example of using LTE models for inference: ```python from transformers import AutoTokenizer from vllm import LLM, SamplingParams model_path="JamyDohrn/LTE-Qwen3-8B-Base" question = "which number is larger? 9.11 or 9.9?" tokenizer = AutoTokenizer.from_pretrained(model_path) messages = [{"role": "user", "content": question}] chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) llm = LLM(model=model_path) params = SamplingParams(temperature=0.6, max_tokens=32768) outputs = llm.generate([chat], params) print(outputs[0].outputs[0].text) ``` ## Acknowledgements LTE is built on the following repositories and we thank their teams for their valuable contributions to the community: - [verl](https://github.com/volcengine/verl) - [LUFFY](https://github.com/ElliottYan/LUFFY) - [LIMO](https://github.com/GAIR-NLP/LIMO) ## Citation If you find our work useful, feel free to cite our paper: ```bib @misc{tang2026steprivertwicelearning, title={Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error}, author={Chenming Tang and Hsiu-Yuan Huang and Weijie Liu and Clive Bai and Saiyong Yang and Yunfang Wu}, year={2026}, eprint={2510.26109}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2510.26109}, } ```