Files
SimpleSD-4B-thinking/README.md
ModelHub XC 88bfdba243 初始化项目,由ModelHub XC社区提供模型
Model: apple/SimpleSD-4B-thinking
Source: Original Platform
2026-05-09 14:17:30 +08:00

69 lines
2.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apple-amlr
base_model:
- Qwen/Qwen3-4B-Thinking-2507
tags:
- self-distillation
- code-generation
library_name: transformers
---
# SimpleSD-4B-thinking
This model is an example of the **Simple Self-Distillation (SimpleSD)** method that improves code generation by fine-tuning a language model on its own sampled outputs—without rewards, verifiers, teacher models, or reinforcement learning. Please see the paper below for more information. This uses Qwen for initialization.
- **Self-distillation sampling:** temperature=1.1, top_p=0.95, top_k=20
- **Evaluation sampling:** temperature=0.7, top_p=0.95, top_k=20
paper: https://arxiv.org/abs/2604.01193
code: https://github.com/apple/ml-ssd
## Notes
- These are research checkpoints for reproducibility.
- They are not optimized Qwen releases.
- They don't represent a broader open-source model strategy.
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("apple/SimpleSD-4B-thinking")
tokenizer = AutoTokenizer.from_pretrained("apple/SimpleSD-4B-thinking")
```
## Method
SimpleSD samples solutions from the base model using non-unit temperature and top-k/top-p truncation, then fine-tunes on those samples via standard supervised learning. Despite its simplicity, SimpleSD yields large gains on competitive programming benchmarks, with improvements concentrating on harder problems. The mechanism traces to resolving a *precisionexploration conflict*: SimpleSD reshapes token distributions in a context-dependent way so that a single global decoding configuration becomes far more effective at evaluation time.
## Results
LiveCodeBench (%)
| Model | LCBv6 pass@1 | LCBv6 pass@5 | LCBv5 pass@1 | LCBv5 pass@5 |
|---|---|---|---|---|
| Qwen3-4B-Thinking-2507 (base) | 54.5 | 67.5 | 59.6 | 70.3 |
| **+ SimpleSD (this model)** | **57.8** (+3.3) | **71.4** (+3.9) | **63.1** (+3.5) | **74.7** (+4.4) |
## Paper
[**Embarrassingly Simple Self-Distillation Improves Code Generation**](https://arxiv.org/abs/2604.01193)
```bibtex
@misc{zhang2026embarrassinglysimpleselfdistillationimproves,
title={Embarrassingly Simple Self-Distillation Improves Code Generation},
author={Ruixiang Zhang and Richard He Bai and Huangjie Zheng and Navdeep Jaitly and Ronan Collobert and Yizhe Zhang},
year={2026},
eprint={2604.01193},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2604.01193},
}
```
## License
This model is released under the [Apple Machine Learning Research Model License](https://huggingface.co/apple/SimpleSD-4B-thinking/blob/main/LICENSE).