Files

27 lines
562 B
Markdown
Raw Permalink Normal View History

---
language:
- en
library_name: transformers
tags:
- qwen
- distillation
- ifeval
- retaining-by-doing
---
# Qwen2.5-0.5B Instruct IFEval Mixed KD (alpha=0.5)
This checkpoint was distilled from a half-epoch `Qwen2.5-1.5B-Instruct` teacher trained on `IFEvalSFTDataset`.
Distillation setup:
- student: `Qwen2.5-0.5B-Instruct`
- teacher: half-epoch `Qwen2.5-1.5B-Instruct`
- `num_train_datapoints=4064`
- `num_epochs=1`
- `distill_alpha=0.5`
- `distill_temperature=2.0`
- effective loss mix: `CE 0.5 / KD 0.5`
Observed local IFEval accuracy:
- `0.4137577002`