初始化项目,由ModelHub XC社区提供模型
Model: SeongryongJung/qwen2.5-0.5b-ifeval-mixed-kd-alpha05 Source: Original Platform
This commit is contained in:
26
README.md
Normal file
26
README.md
Normal file
@@ -0,0 +1,26 @@
|
||||
---
|
||||
language:
|
||||
- en
|
||||
library_name: transformers
|
||||
tags:
|
||||
- qwen
|
||||
- distillation
|
||||
- ifeval
|
||||
- retaining-by-doing
|
||||
---
|
||||
|
||||
# Qwen2.5-0.5B Instruct IFEval Mixed KD (alpha=0.5)
|
||||
|
||||
This checkpoint was distilled from a half-epoch `Qwen2.5-1.5B-Instruct` teacher trained on `IFEvalSFTDataset`.
|
||||
|
||||
Distillation setup:
|
||||
- student: `Qwen2.5-0.5B-Instruct`
|
||||
- teacher: half-epoch `Qwen2.5-1.5B-Instruct`
|
||||
- `num_train_datapoints=4064`
|
||||
- `num_epochs=1`
|
||||
- `distill_alpha=0.5`
|
||||
- `distill_temperature=2.0`
|
||||
- effective loss mix: `CE 0.5 / KD 0.5`
|
||||
|
||||
Observed local IFEval accuracy:
|
||||
- `0.4137577002`
|
||||
Reference in New Issue
Block a user