--- language: - en library_name: transformers tags: - qwen - distillation - ifeval - retaining-by-doing --- # Qwen2.5-0.5B Instruct IFEval Pure KD This checkpoint was distilled from a half-epoch `Qwen2.5-1.5B-Instruct` teacher trained on `IFEvalSFTDataset`. Distillation setup: - student: `Qwen2.5-0.5B-Instruct` - teacher: half-epoch `Qwen2.5-1.5B-Instruct` - `num_train_datapoints=4064` - `num_epochs=1` - `distill_alpha=1.0` - `distill_temperature=2.0` Observed local IFEval accuracy: - `0.4050308008`