初始化项目，由ModelHub XC社区提供模型

Model: JamyDohrn/LTE-Qwen3-8B-Base Source: Original Platform
2026-04-24 13:14:21 +08:00
commit adf699d27e
15 changed files with 152250 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,69 @@
+---
+base_model:
+- Qwen/Qwen3-8B-Base
+datasets:
+- Elliott/Openr1-Math-46k-8192
+language:
+- en
+license: apache-2.0
+metrics:
+- accuracy
+pipeline_tag: text-generation
+library_name: transformers
+---
+
+# LTE-Qwen3-8B-Base
+
+[![arXiv](https://img.shields.io/badge/arXiv-2510.26109-b31b1b.svg?logo=arxiv)](https://arxiv.org/abs/2510.26109)
+[![GitHub](https://img.shields.io/badge/GitHub-LTE-blue?logo=github)](https://github.com/JamyDon/LTE)
+[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg?logo=apache)](LICENSE)
+
+## Introduction
+LTE (Learning to reason from Trial and Error) is an RLVR (Reinforcement Learning with Verifiable Rewards) approach presented in the paper [Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error](https://huggingface.co/papers/2510.26109). 
+
+LTE mitigates the exploration stagnation of Language Models (LMs) by utilizing their previously self-made mistakes as hints, requiring no external expert guidance. It improves the performance upper bound of LMs and enhances both exploitation and exploration during training.
+
+## Key Highlights
+- **Self-generated Hints**: LTE uses the errors generated by the LMs themselves during training as hints.
+- **No External Expert Guidance**: LTE does not require any external expert guidance to mitigate the exploration stagnation of LMs.
+
+## Inference
+Here is an example of using LTE models for inference:
+
+```python
+from transformers import AutoTokenizer
+from vllm import LLM, SamplingParams
+
+model_path="JamyDohrn/LTE-Qwen3-8B-Base"
+
+question = "which number is larger? 9.11 or 9.9?"
+
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+messages = [{"role": "user", "content": question}]
+chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+
+llm = LLM(model=model_path)
+params = SamplingParams(temperature=0.6, max_tokens=32768)
+outputs = llm.generate([chat], params)
+print(outputs[0].outputs[0].text)
+```
+
+## Acknowledgements
+LTE is built on the following repositories and we thank their teams for their valuable contributions to the community:
+- [verl](https://github.com/volcengine/verl)
+- [LUFFY](https://github.com/ElliottYan/LUFFY)
+- [LIMO](https://github.com/GAIR-NLP/LIMO)
+
+## Citation
+If you find our work useful, feel free to cite our paper:
+```bibtex
+@misc{tang2026steprivertwicelearning,
+      title={Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error}, 
+      author={Chenming Tang and Hsiu-Yuan Huang and Weijie Liu and Clive Bai and Saiyong Yang and Yunfang Wu},
+      year={2026},
+      eprint={2510.26109},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2510.26109}, 
+}
+```