language, library_name, tags
language library_name tags
en
transformers
qwen
distillation
ifeval
retaining-by-doing

Qwen2.5-0.5B Instruct IFEval Mixed KD (alpha=0.5)

This checkpoint was distilled from a half-epoch Qwen2.5-1.5B-Instruct teacher trained on IFEvalSFTDataset.

Distillation setup:

  • student: Qwen2.5-0.5B-Instruct
  • teacher: half-epoch Qwen2.5-1.5B-Instruct
  • num_train_datapoints=4064
  • num_epochs=1
  • distill_alpha=0.5
  • distill_temperature=2.0
  • effective loss mix: CE 0.5 / KD 0.5

Observed local IFEval accuracy:

  • 0.4137577002
Description
Model synced from source: SeongryongJung/qwen2.5-0.5b-ifeval-mixed-kd-alpha05
Readme 2 MiB
Languages
Jinja 100%