初始化项目,由ModelHub XC社区提供模型
Model: SeongryongJung/qwen2.5-0.5b-ifeval-pure-kd Source: Original Platform
This commit is contained in:
25
README.md
Normal file
25
README.md
Normal file
@@ -0,0 +1,25 @@
|
||||
---
|
||||
language:
|
||||
- en
|
||||
library_name: transformers
|
||||
tags:
|
||||
- qwen
|
||||
- distillation
|
||||
- ifeval
|
||||
- retaining-by-doing
|
||||
---
|
||||
|
||||
# Qwen2.5-0.5B Instruct IFEval Pure KD
|
||||
|
||||
This checkpoint was distilled from a half-epoch `Qwen2.5-1.5B-Instruct` teacher trained on `IFEvalSFTDataset`.
|
||||
|
||||
Distillation setup:
|
||||
- student: `Qwen2.5-0.5B-Instruct`
|
||||
- teacher: half-epoch `Qwen2.5-1.5B-Instruct`
|
||||
- `num_train_datapoints=4064`
|
||||
- `num_epochs=1`
|
||||
- `distill_alpha=1.0`
|
||||
- `distill_temperature=2.0`
|
||||
|
||||
Observed local IFEval accuracy:
|
||||
- `0.4050308008`
|
||||
Reference in New Issue
Block a user