Files

ModelHub XC 99ccb9849e 初始化项目，由ModelHub XC社区提供模型

Model: OpenHands/CodeScout-1.7B
Source: Original Platform

2026-06-05 00:06:25 +08:00

4.8 KiB

Raw Blame History

library_name, license, language, base_model, pipeline_tag, tags, datasets

library_name

license

language

base_model

pipeline_tag

CodeScout-1.7B

📄 Paper • 💻 Code • 🤗 Collection

Compact yet powerful — outperforms 8× larger Qwen3-14B using only a Unix terminal.

CodeScout-1.7B is part of the CodeScout family of open-source RL-trained code search agents. CodeScout models achieve state-of-the-art repository-level code localization using nothing more than a standard Unix terminal — no static analysis, no repository graphs, no language-specific tooling.

Key Highlights

Outperforms 8× larger Qwen3-14B with absolute F1 gains of 11–18% for files and 10–15% for functions
Competitive with 18× larger Qwen3-32B (Thinking), surpassing it by 3–6% in function F1
Matches RepoNavigator-7B performance while being 4× smaller
Demonstrates that RL + distillation can compress strong code search into a 1.7B model

Results

Performance on SWE-Bench code localization (instance-averaged F1 scores):

Benchmark	CodeScout-1.7B	CodeScout-4B	CodeScout-14B
SWE-Bench Verified — File F1	55.46	68.52	68.57
SWE-Bench Verified — Func F1	28.22	36.78	40.32
SWE-Bench Pro — File F1	40.96	51.77	53.63
SWE-Bench Pro — Func F1	18.24	29.03	28.74
SWE-Bench Lite — File F1	56.57	67.03	71.84
SWE-Bench Lite — Func F1	27.07	39.87	44.43

Code localization performance on SWE-Bench Verified. CodeScout (⭐) achieves superior or competitive results over larger open-source LLMs and narrows the gap with closed-source frontier models.

Training

CodeScout-1.7B is trained in two stages:

Stage 1 — Rejection Fine-Tuning (RFT): Qwen3-1.7B is warm-started via supervised fine-tuning on 4K perfect-score trajectories (F1 = 1.0 at all granularities) sampled from CodeScout-14B, yielding the CodeScout-1.7B-RFT checkpoint.

Stage 2 — RL Training: CodeScout-1.7B-RFT is further trained with GSPO reinforcement learning.

Training data (RL): 800 instances (disjoint from RFT data)
RL steps: 100
Batch size: 8, with 8 rollouts per instance
Max context length: 32K tokens
Max turns per episode: 4
Reward: Multi-level F1 (file + module + function)
Hardware: 8×H100 GPUs
Learning rate: 1e-6 (constant)

How It Works

CodeScout uses the OpenHands-Bash scaffold — an agent equipped with only a Terminal tool (supporting standard Unix commands like rg, find, grep, ls) and a LocalizationFinish tool for structured output submission. The agent iteratively navigates the repository to identify relevant files, classes, and functions related to a given issue.

The model is trained with GSPO (Group Sequence Policy Optimization) using multi-level F1 rewards at the file, module, and function level.

Intended Use

CodeScout-1.7B is designed for repository-level code localization: given a GitHub issue description and a code repository, it identifies the relevant files, classes, and functions that need to be modified. It is intended to be used as a localization subagent within larger coding agent pipelines.

Limitations

Trained and evaluated exclusively on Python repositories
Designed for code localization, not code editing or issue resolution
Performance may vary on repositories significantly different from the training distribution
Requires the OpenHands-Bash scaffold for optimal performance

Citation

@misc{sutawika2026codescouteffectiverecipereinforcement,
      title={CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents}, 
      author={Lintang Sutawika and Aditya Bharat Soni and Bharath Sriraam R R and Apurva Gandhi and Taha Yassine and Sanidhya Vijayvargiya and Yuchen Li and Xuhui Zhou and Yilin Zhang and Leander Melroy Maben and Graham Neubig},
      year={2026},
      eprint={2603.17829},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2603.17829}, 
}

4.8 KiB Raw Blame History Unescape Escape