27 lines
562 B
Markdown
27 lines
562 B
Markdown
|
|
---
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
library_name: transformers
|
||
|
|
tags:
|
||
|
|
- qwen
|
||
|
|
- distillation
|
||
|
|
- ifeval
|
||
|
|
- retaining-by-doing
|
||
|
|
---
|
||
|
|
|
||
|
|
# Qwen2.5-0.5B Instruct IFEval Mixed KD (alpha=0.5)
|
||
|
|
|
||
|
|
This checkpoint was distilled from a half-epoch `Qwen2.5-1.5B-Instruct` teacher trained on `IFEvalSFTDataset`.
|
||
|
|
|
||
|
|
Distillation setup:
|
||
|
|
- student: `Qwen2.5-0.5B-Instruct`
|
||
|
|
- teacher: half-epoch `Qwen2.5-1.5B-Instruct`
|
||
|
|
- `num_train_datapoints=4064`
|
||
|
|
- `num_epochs=1`
|
||
|
|
- `distill_alpha=0.5`
|
||
|
|
- `distill_temperature=2.0`
|
||
|
|
- effective loss mix: `CE 0.5 / KD 0.5`
|
||
|
|
|
||
|
|
Observed local IFEval accuracy:
|
||
|
|
- `0.4137577002`
|