2.4 KiB
2.4 KiB
base_model, datasets, language, license, metrics, pipeline_tag, library_name
| base_model | datasets | language | license | metrics | pipeline_tag | library_name | ||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
apache-2.0 |
|
text-generation | transformers |
Introduction
LTE is an RLVR approach that mitigates the exploration stagnation of LMs by their previously self-made mistakes and does not require any external expert guidance. LTE improves the performance upper bound of LMs and enhances both exploitation and exploration during training.
Key Highlights
- Self-generated Hints: LTE uses the errors generated by the LMs themselves during training as hints.
- No External Expert Guidance: LTE does not require any external expert guidance to mitigate the exploration stagnation of LMs.
Inference
Here is an example of using LTE models for inference:
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
model_path="JamyDohrn/LTE-Qwen3-8B-Base"
question = "which number is larger? 9.11 or 9.9?"
tokenizer = AutoTokenizer.from_pretrained(model_path)
messages = [{"role": "user", "content": question}]
chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
llm = LLM(model=model_path)
params = SamplingParams(temperature=0.6, max_tokens=32768)
outputs = llm.generate([chat], params)
print(outputs[0].outputs[0].text)
Acknowledgements
LTE is built on the following repositories and we thank their teams for their valuable contributions to the community:
Citation
If you find our work useful, feel free to cite our paper:
@misc{tang2026steprivertwicelearning,
title={Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error},
author={Chenming Tang and Hsiu-Yuan Huang and Weijie Liu and Clive Bai and Saiyong Yang and Yunfang Wu},
year={2026},
eprint={2510.26109},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2510.26109},
}