65 lines
2.4 KiB
Markdown
65 lines
2.4 KiB
Markdown
|
|
---
|
||
|
|
base_model:
|
||
|
|
- Qwen/Qwen3-8B-Base
|
||
|
|
datasets:
|
||
|
|
- Elliott/Openr1-Math-46k-8192
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
license: apache-2.0
|
||
|
|
metrics:
|
||
|
|
- accuracy
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
library_name: transformers
|
||
|
|
---
|
||
|
|
|
||
|
|
[](https://arxiv.org/abs/2510.26109)
|
||
|
|
[](https://github.com/JamyDon/LTE)
|
||
|
|
[](LICENSE)
|
||
|
|
|
||
|
|
## Introduction
|
||
|
|
LTE is an RLVR approach that mitigates the exploration stagnation of LMs by their previously self-made mistakes and does not require any external expert guidance. LTE improves the performance upper bound of LMs and enhances both exploitation and exploration during training.
|
||
|
|
|
||
|
|
## Key Highlights
|
||
|
|
- **Self-generated Hints**: LTE uses the errors generated by the LMs themselves during training as hints.
|
||
|
|
- **No External Expert Guidance**: LTE does not require any external expert guidance to mitigate the exploration stagnation of LMs.
|
||
|
|
|
||
|
|
## Inference
|
||
|
|
Here is an example of using LTE models for inference:
|
||
|
|
|
||
|
|
```python
|
||
|
|
from transformers import AutoTokenizer
|
||
|
|
from vllm import LLM, SamplingParams
|
||
|
|
|
||
|
|
model_path="JamyDohrn/LTE-Qwen3-8B-Base"
|
||
|
|
|
||
|
|
question = "which number is larger? 9.11 or 9.9?"
|
||
|
|
|
||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
||
|
|
messages = [{"role": "user", "content": question}]
|
||
|
|
chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||
|
|
|
||
|
|
llm = LLM(model=model_path)
|
||
|
|
params = SamplingParams(temperature=0.6, max_tokens=32768)
|
||
|
|
outputs = llm.generate([chat], params)
|
||
|
|
print(outputs[0].outputs[0].text)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Acknowledgements
|
||
|
|
LTE is built on the following repositories and we thank their teams for their valuable contributions to the community:
|
||
|
|
- [verl](https://github.com/volcengine/verl)
|
||
|
|
- [LUFFY](https://github.com/ElliottYan/LUFFY)
|
||
|
|
- [LIMO](https://github.com/GAIR-NLP/LIMO)
|
||
|
|
|
||
|
|
## Citation
|
||
|
|
If you find our work useful, feel free to cite our paper:
|
||
|
|
```bib
|
||
|
|
@misc{tang2026steprivertwicelearning,
|
||
|
|
title={Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error},
|
||
|
|
author={Chenming Tang and Hsiu-Yuan Huang and Weijie Liu and Clive Bai and Saiyong Yang and Yunfang Wu},
|
||
|
|
year={2026},
|
||
|
|
eprint={2510.26109},
|
||
|
|
archivePrefix={arXiv},
|
||
|
|
primaryClass={cs.LG},
|
||
|
|
url={https://arxiv.org/abs/2510.26109},
|
||
|
|
}
|
||
|
|
```
|