SWE-AGILE-RL-8B/README.md

---
tags:
- agent
pipeline_tag: text-generation
library_name: transformers
---

# SWE-AGILE

## 📣 News

[2026/02/23] SWE-AGILE has been accepted to the ACL 2026 Findings.

<font size=4><div align='center' > [[📖 Paper](https://huggingface.co/papers/2604.11716)] [[🤗 Checkpoints](https://huggingface.co/KDEGroup)] [[🤗 Daily Paper](https://huggingface.co/papers/2604.11716)] [[🚀 Github](https://github.com/KDEGroup/SWE-AGILE)]</div></font>

## 🔥 Overview

Prior approaches typically lack the **explicit System-2 reasoning** required for deep analysis. While recent reasoning models demonstrate the potential of extended Chain-of-Thought (CoT), applying them to multi-turn tasks creates a dilemma: retaining full history leads to **context explosion**, while discarding it causes **redundant re-reasoning**.

We propose SWE-AGILE, a novel software agent framework designed to bridge the gap between reasoning depth, efficiency, and context constraints. SWE-AGILE introduces a Dynamic Reasoning Context strategy, maintaining a “sliding window” of detailed reasoning for immediate continuity to prevent redundant re-analyzing, while compressing historical reasoning content into concise Reasoning Digests via **Backfilling Data Synthesis**, **Trajectory Snapshot Training** and **Compression-Aware Optimization**.


While our current paradigm implicitly reduces redundant state reconstruction, a highly promising direction to strictly enforce this efficiency is to quantitatively monitor the reasoning content. By calculating the embedding similarity between consecutive reasoning steps or employing an LLM-as-a-Judge, future iterations can explicitly filter out repetitive SFT trajectories or design targeted RLVR penalties, pushing the boundary of cognitive efficiency even further.

![overview](https://cdn-uploads.huggingface.co/production/uploads/669f53549a21428ccda89fab/YQmyhamGa29mmAWAh1Qoe.png)

![swe-bench-verified](https://cdn-uploads.huggingface.co/production/uploads/669f53549a21428ccda89fab/YGIov1OkGhAcE4wsPBCs2.png)

## ⭐️ Citation

If you find this project useful, please cite our work:

```bibtex
@misc{lian2026sweagilesoftwareagentframework,
      title={SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context}, 
      author={Shuquan Lian and Juncheng Liu and Yazhe Chen and Yuhong Chen and Hui Li},
      year={2026},
      eprint={2604.11716},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2604.11716}, 
}
```

## 🤝 Acknowledgements

We sincerely thank the projects [R2E-Gym/R2E-Gym](https://github.com/R2E-Gym/R2E-Gym) and [rllm-org/rllm](https://github.com/rllm-org/rllm) for providing their open-source resources.
初始化项目，由ModelHub XC社区提供模型 Model: KDEGroup/SWE-AGILE-RL-8B Source: Original Platform 2026-05-05 06:59:45 +08:00			`---`
			`tags:`
			`- agent`
			`pipeline_tag: text-generation`
			`library_name: transformers`
			`---`

			`# SWE-AGILE`

			`## 📣 News`

			`[2026/02/23] SWE-AGILE has been accepted to the ACL 2026 Findings.`

			`<font size=4><div align='center' > [[📖 Paper](https://huggingface.co/papers/2604.11716)] [[🤗 Checkpoints](https://huggingface.co/KDEGroup)] [[🤗 Daily Paper](https://huggingface.co/papers/2604.11716)] [[🚀 Github](https://github.com/KDEGroup/SWE-AGILE)]</div></font>`

			`## 🔥 Overview`

			`Prior approaches typically lack the explicit System-2 reasoning required for deep analysis. While recent reasoning models demonstrate the potential of extended Chain-of-Thought (CoT), applying them to multi-turn tasks creates a dilemma: retaining full history leads to context explosion, while discarding it causes redundant re-reasoning.`

			We propose SWE-AGILE, a novel software agent framework designed to bridge the gap between reasoning depth, efficiency, and context constraints. SWE-AGILE introduces a Dynamic Reasoning Context strategy, maintaining a “sliding window” of detailed reasoning for immediate continuity to prevent redundant re-analyzing, while compressing historical reasoning content into concise Reasoning Digests via Backfilling Data Synthesis, Trajectory Snapshot Training and Compression-Aware Optimization.



			`While our current paradigm implicitly reduces redundant state reconstruction, a highly promising direction to strictly enforce this efficiency is to quantitatively monitor the reasoning content. By calculating the embedding similarity between consecutive reasoning steps or employing an LLM-as-a-Judge, future iterations can explicitly filter out repetitive SFT trajectories or design targeted RLVR penalties, pushing the boundary of cognitive efficiency even further.`

			`![overview](https://cdn-uploads.huggingface.co/production/uploads/669f53549a21428ccda89fab/YQmyhamGa29mmAWAh1Qoe.png)`

			`![swe-bench-verified](https://cdn-uploads.huggingface.co/production/uploads/669f53549a21428ccda89fab/YGIov1OkGhAcE4wsPBCs2.png)`

			`## ⭐️ Citation`

			`If you find this project useful, please cite our work:`

			```bibtex
			`@misc{lian2026sweagilesoftwareagentframework,`
			`title={SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context},`
			`author={Shuquan Lian and Juncheng Liu and Yazhe Chen and Yuhong Chen and Hui Li},`
			`year={2026},`
			`eprint={2604.11716},`
			`archivePrefix={arXiv},`
			`primaryClass={cs.AI},`
			`url={https://arxiv.org/abs/2604.11716},`
			`}`
			```

			`## 🤝 Acknowledgements`

			`We sincerely thank the projects [R2E-Gym/R2E-Gym](https://github.com/R2E-Gym/R2E-Gym) and [rllm-org/rllm](https://github.com/rllm-org/rllm) for providing their open-source resources.`