初始化项目，由ModelHub XC社区提供模型

Model: THU-KEG/LongTraceRL-4B Source: Original Platform
2026-06-10 00:53:26 +08:00
commit c3be4b5d07
11 changed files with 152184 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,64 @@
+---
+license: apache-2.0
+language:
+- en
+tags:
+- long-context
+- reinforcement-learning
+- reasoning
+- rubric-reward
+- qwen3
+base_model:
+- Qwen/Qwen3-4B
+---
+
+# LongTraceRL-4B
+
+[![Paper](https://img.shields.io/badge/Paper-arXiv:2605.31584-red)](https://arxiv.org/abs/2605.31584)
+[![Code](https://img.shields.io/badge/Code-GitHub-blue)](https://github.com/THU-KEG/LongTraceRL)
+
+## Model Description
+
+**LongTraceRL-4B** is a 4-billion parameter reasoning model trained with reinforcement learning on long-context multi-hop QA tasks using trajectory-based tiered distractors and entity-level rubric rewards.
+
+## Model Details
+
+- **Base Model**: Qwen3-4B-Thinking-2507
+- **Parameters**: 4B
+- **Architecture**: Qwen3 (36 layers, hidden size 2560, GQA with 8 KV groups)
+- **Training Method**: GRPO with entity-level rubric reward
+- **Context Length**: 128K prompt + 32K response
+- **Language**: English
+
+## Training Details
+
+- **Training Data**: 2,815 long-context multi-hop QA samples ([LongTraceRL Dataset](https://huggingface.co/datasets/THU-KEG/LongTraceRL))
+- **Training Steps**: 200
+- **Learning Rate**: 2e-6 (constant)
+- **Global Batch Size**: 128
+- **GRPO Group Size**: 8
+- **Rubric Reward Weight (η)**: 0.3
+- **Framework**: [Slime](https://github.com/THUDM/slime) (Megatron-LM + SGLang)
+
+## Usage
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model = AutoModelForCausalLM.from_pretrained("THU-KEG/LongTraceRL-4B")
+tokenizer = AutoTokenizer.from_pretrained("THU-KEG/LongTraceRL-4B")
+```
+
+## Citation
+
+```bibtex
+@misc{lin2026longtracerllearninglongcontextreasoning,
+      title={LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards}, 
+      author={Nianyi Lin and Jiajie Zhang and Lei Hou and Juanzi Li},
+      year={2026},
+      eprint={2605.31584},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2605.31584}, 
+}
+```