ModelHub XC 1926711e5d 初始化项目,由ModelHub XC社区提供模型
Model: boradorish/qwen3-4b-base-prompt
Source: Original Platform
2026-05-28 15:44:34 +08:00

library_name, license, base_model, tags, model-index
library_name license base_model tags model-index
transformers other Qwen/Qwen3-4B
llama-factory
full
generated_from_trainer
name results
sft_base

sft_base

This model is a fine-tuned version of Qwen/Qwen3-4B on the sunny_reasoning dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0087

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • total_eval_batch_size: 2
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss
0.0056 0.1698 92 0.0098
0.0115 0.3397 184 0.0107
0.015 0.5095 276 0.0094
0.0082 0.6794 368 0.0104
0.0094 0.8492 460 0.0095
0.0038 1.0185 552 0.0086
0.0029 1.1883 644 0.0095
0.01 1.3581 736 0.0082
0.0019 1.5280 828 0.0081
0.0045 1.6978 920 0.0080
0.0091 1.8677 1012 0.0077
0.0057 2.0369 1104 0.0081
0.0006 2.2068 1196 0.0086
0.0075 2.3766 1288 0.0088
0.0065 2.5464 1380 0.0087
0.0084 2.7163 1472 0.0087
0.0027 2.8861 1564 0.0087

Framework versions

  • Transformers 4.56.2
  • Pytorch 2.11.0+cu128
  • Datasets 3.0.0
  • Tokenizers 0.22.2
Description
Model synced from source: boradorish/qwen3-4b-base-prompt
Readme 2 MiB
Languages
Jinja 100%