初始化项目，由ModelHub XC社区提供模型

Model: andrewlngdn/dsl-debug-7b-sft-step100 Source: Original Platform
2026-04-21 02:45:22 +08:00
commit bbcdb27b50
13 changed files with 455398 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,63 @@
+---
+license: mit
+base_model: Qwen/Qwen2.5-7B-Instruct
+tags:
+  - debugging
+  - tool-use
+  - multi-turn
+  - sft
+datasets:
+  - custom
+language:
+  - en
+pipeline_tag: text-generation
+---
+
+# DSL Debug 7B — SFT Step 100
+
+Qwen2.5-7B-Instruct fine-tuned on 1,593 debugging trajectories for the DSL Debug environment.
+
+**Blog post:** [Multi-Turn RL for Code Debugging](https://andrewlngdn.github.io/dsl_debugger/)
+**Code + environment:** [github.com/AndrewLngdn/dsl-debug](https://github.com/AndrewLngdn/dsl-debug)
+
+## Training
+
+- **Method**: Supervised fine-tuning (verl 0.7)
+- **Data**: 1,593 multi-turn trajectories with tool calls (run, inspect, read_docs, submit)
+- **Base model**: Qwen2.5-7B-Instruct
+- **Epochs**: 2 (step 100 checkpoint)
+- **LR**: 5e-6
+- **Hardware**: 2x A100-SXM4-80GB
+
+## Results (held-out test, one-shot)
+
+| Split | Base Model | This Model |
+|-------|:---:|:---:|
+| Standard (481) | 50.5% | **56.3%** |
+| Nonlocal (200) | 12.0% | **40.0%** |
+| Intent-Mismatch (177) | 0.6% | **7.9%** |
+
+## Alignment Tax
+
+| Benchmark | Base | This Model |
+|-----------|:---:|:---:|
+| MMLU (5-shot) | 74.6% | 74.6% |
+| GSM8K (8-shot) | 84.9% | 83.9% |
+| HumanEval (0-shot) | 65.9% | 62.2% |
+
+## Usage
+
+This checkpoint is primarily used as the starting point for SFT then RL training (GRPO), which achieves the best results.
+
+```python
+from huggingface_hub import snapshot_download
+snapshot_download("andrewlngdn/dsl-debug-7b-sft-step100",
+    local_dir="/workspace/models/sft_7b_step100")
+```
+
+## Related Models
+
+| Model | Repo |
+|-------|------|
+| **SFT then RL step 35 (best)** | [andrewlngdn/dsl-debug-7b-sft-rl](https://huggingface.co/andrewlngdn/dsl-debug-7b-sft-rl) |
+| RL-only step 30 | [andrewlngdn/dsl-debug-7b-rl-only-step30](https://huggingface.co/andrewlngdn/dsl-debug-7b-rl-only-step30) |