ModelHub XC bc796b7b06 初始化项目,由ModelHub XC社区提供模型
Model: huseyinatahaninan/appworld_distillation_sft_v2-SFT-Qwen3-8B
Source: Original Platform
2026-06-08 18:17:02 +08:00

library_name, license, base_model, tags, model-index
library_name license base_model tags model-index
transformers apache-2.0 Qwen/Qwen3-8B
llama-factory
full
generated_from_trainer
name results
appworld_distillation_sft_v2-SFT-Qwen3-8B

appworld_distillation_sft_v2-SFT-Qwen3-8B

This model is a fine-tuned version of Qwen/Qwen3-8B on the appworld_distillation_sft_v2 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6342

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 25.0

Training results

Training Loss Epoch Step Validation Loss
1.1697 1.0 2 1.2944
1.0781 2.0 4 1.1407
0.9806 3.0 6 1.0041
0.9093 4.0 8 0.9051
0.8595 5.0 10 0.8866
0.7688 6.0 12 0.8013
0.7223 7.0 14 0.7614
0.689 8.0 16 0.7272
0.6641 9.0 18 0.7127
0.5795 10.0 20 0.6720
0.5451 11.0 22 0.6551
0.5059 12.0 24 0.6409
0.5035 13.0 26 0.6352
0.484 14.0 28 0.6281
0.4436 15.0 30 0.6252
0.4347 16.0 32 0.6250
0.4139 17.0 34 0.6253
0.4108 18.0 36 0.6265
0.3969 19.0 38 0.6287
0.3825 20.0 40 0.6303
0.3839 21.0 42 0.6313
0.3699 22.0 44 0.6326
0.3871 23.0 46 0.6336
0.382 24.0 48 0.6338
0.3879 25.0 50 0.6342

Framework versions

  • Transformers 4.52.4
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Description
Model synced from source: huseyinatahaninan/appworld_distillation_sft_v2-SFT-Qwen3-8B
Readme 13 MiB
Languages
Jinja 100%