ModelHub XC 7ceef409c6 初始化项目,由ModelHub XC社区提供模型
Model: rbelanec/train_mrpc_42_1776331557
Source: Original Platform
2026-05-03 10:20:24 +08:00

library_name, license, base_model, tags, model-index
library_name license base_model tags model-index
transformers llama3.2 meta-llama/Llama-3.2-1B-Instruct
peft-factory
full
llama-factory
generated_from_trainer
name results
train_mrpc_42_1776331557

train_mrpc_42_1776331557

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1084
  • Num Input Tokens Seen: 1780000

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1552 0.2518 104 0.1485 89600
0.2178 0.5036 208 0.1320 178688
0.1165 0.7554 312 0.1130 267968
0.1193 1.0073 416 0.1084 357488
0.0685 1.2591 520 0.1903 446896
0.0801 1.5109 624 0.1982 536176
0.2066 1.7627 728 0.1449 626992
0.0011 2.0145 832 0.2068 716344
0.0059 2.2663 936 0.2691 806712
0.0756 2.5182 1040 0.2895 895736
0.0001 2.7700 1144 0.2260 985592
0.0 3.0218 1248 0.2253 1074624
0.0 3.2736 1352 0.2578 1164544
0.0 3.5254 1456 0.2580 1253248
0.0 3.7772 1560 0.2703 1344000
0.0 4.0291 1664 0.2502 1432880
0.0001 4.2809 1768 0.2504 1522544
0.0 4.5327 1872 0.2489 1611760
0.0 4.7845 1976 0.2508 1702832

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Description
Model synced from source: rbelanec/train_mrpc_42_1776331557
Readme 158 KiB