--- language: - en library_name: transformers tags: - qwen - distillation - ifeval - retaining-by-doing --- # Qwen2.5-0.5B Instruct IFEval Mixed KD (alpha=0.5) This checkpoint was distilled from a half-epoch `Qwen2.5-1.5B-Instruct` teacher trained on `IFEvalSFTDataset`. Distillation setup: - student: `Qwen2.5-0.5B-Instruct` - teacher: half-epoch `Qwen2.5-1.5B-Instruct` - `num_train_datapoints=4064` - `num_epochs=1` - `distill_alpha=0.5` - `distill_temperature=2.0` - effective loss mix: `CE 0.5 / KD 0.5` Observed local IFEval accuracy: - `0.4137577002`