ModelHub XC e6dbf63386 初始化项目,由ModelHub XC社区提供模型
Model: andrewlngdn/dsl-debug-7b-sft-rl
Source: Original Platform
2026-04-22 11:17:31 +08:00

license, language, base_model, tags, datasets, pipeline_tag, model-index
license language base_model tags datasets pipeline_tag model-index
mit
en
Qwen/Qwen2.5-7B-Instruct
reinforcement-learning
grpo
tool-use
debugging
dsl
custom
text-generation
name results
dsl-debug-7b-sft-rl
task metrics
type name
text-generation DSL Debugging (Standard)
type value name
accuracy 86.1 Standard Bugs (481)
task metrics
type name
text-generation DSL Debugging (Nonlocal)
type value name
accuracy 70.5 Nonlocal Bugs (200)
task metrics
type name
text-generation DSL Debugging (Intent-Mismatch)
type value name
accuracy 28.2 Intent-Mismatch Bugs (177)

DSL Debug 7B: SFT then RL (Best Checkpoint)

Qwen2.5-7B-Instruct fine-tuned with SFT then GRPO reinforcement learning to debug programs in a custom dataflow DSL. This is the best-performing checkpoint (step 35 of 40 RL steps).

Blog post: Multi-Turn RL for Code Debugging Code + environment: github.com/AndrewLngdn/dsl-debug

Results

Held-out test set, one-shot evaluation:

Method Standard (481) Nonlocal (200) Intent-Mismatch (177)
Prompt Engineering (base) 50.5% 12.0% 0.6%
SFT (step 100) 56.3% 40.0% 7.9%
RL-only (step 30) 78.8% 54.0% 14.7%
SFT then RL (this model) 86.1% 70.5% 28.2%

Alignment Tax

Benchmark Base This Model
MMLU (5-shot) 74.6% 74.5%
GSM8K (8-shot) 84.9% 84.1%
HumanEval (0-shot) 65.9% 62.2%

Training

Two-stage training on 2x A100-80GB using verl 0.7:

  1. SFT: 1,593 expert trajectories from GPT-5-mini, full parameter updates, LR=5e-6, 2 epochs (step 100)
  2. GRPO: 6,420 RL problems, LR=1e-5 cosine, batch 512 prompts x 8 rollouts, 40 steps, no KL penalty

The Task

The model debugs programs in a custom pipe-based dataflow DSL. Each episode provides buggy code and expected output. The model has 8 turns and 4 tools:

  • run(code): Execute DSL code
  • inspect(node_name): View intermediate table output
  • read_docs(operation): Read DSL documentation
  • submit(code): Submit fix (ends episode, binary reward)

Usage

# With sglang
from sglang import RuntimeEndpoint
import sglang as sgl

runtime = RuntimeEndpoint("http://localhost:30000")

# Or download and serve
from huggingface_hub import snapshot_download
snapshot_download("andrewlngdn/dsl-debug-7b-sft-rl", local_dir="./model")
# Using the dsl-debug CLI
pip install dsl-debug
dsl-debug sglang                    # downloads and serves this model
dsl-debug eval --split standard     # evaluate on test set
Model Repo
SFT step 100 andrewlngdn/dsl-debug-7b-sft-step100
RL-only step 30 andrewlngdn/dsl-debug-7b-rl-only-step30
Description
Model synced from source: andrewlngdn/dsl-debug-7b-sft-rl
Readme 2 MiB
Languages
Jinja 100%