library_name, license, base_model, tags, model-index
library_name license base_model tags model-index
transformers apache-2.0 Qwen/Qwen3-0.6B
generated_from_trainer
name results
sft-count_loss-Qwen3-0.6B-mle0.5-ul0.5-tox0-e4

sft-count_loss-Qwen3-0.6B-mle0.5-ul0.5-tox0-e4

This model is a fine-tuned version of Qwen/Qwen3-0.6B on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9505

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 5
  • num_epochs: 4

Training results

Training Loss Epoch Step Validation Loss
2.7934 0.2899 200 1.3768
2.6787 0.5797 400 1.3584
2.7074 0.8696 600 1.3443
1.8508 1.1594 800 1.3934
1.9016 1.4493 1000 1.4017
1.8603 1.7391 1200 1.4073
1.7469 2.0290 1400 1.6987
0.9924 2.3188 1600 1.7187
1.0118 2.6087 1800 1.7246
0.9845 2.8986 2000 1.7222
0.5651 3.1884 2200 1.9391
0.5605 3.4783 2400 1.9573
0.553 3.7681 2600 1.9505

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Description
Model synced from source: TarhanE/sft-count_loss-Qwen3-0.6B-mle0.5-ul0.5-tox0-e4
Readme 2 MiB