train_cola_42_1776331560/README.md at main

Files

ModelHub XC aa0222e4a1 初始化项目，由ModelHub XC社区提供模型

Model: rbelanec/train_cola_42_1776331560
Source: Original Platform

2026-05-03 10:17:08 +08:00

library_name, license, base_model, tags, model-index

library_name

license

base_model

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2021	0.2505	241	0.2780	97664
0.2402	0.5010	482	0.2002	194560
0.1906	0.7516	723	0.2094	291712
0.2397	1.0021	964	0.1763	387464
0.0622	1.2526	1205	0.2676	485192
0.0911	1.5031	1446	0.3146	581704
0.1042	1.7536	1687	0.2114	677576
0.096	2.0042	1928	0.3562	775312
0.0094	2.2547	2169	0.3035	873104
0.0894	2.5052	2410	0.3649	969360
0.0705	2.7557	2651	0.3061	1065232
0.0016	3.0062	2892	0.2698	1162016
0.0469	3.2568	3133	0.3603	1259168
0.0682	3.5073	3374	0.4128	1355552
0.0128	3.7578	3615	0.3697	1453088
0.0238	4.0083	3856	0.3716	1549360
0.0	4.2588	4097	0.4492	1645808
0.0202	4.5094	4338	0.4368	1742960
0.0001	4.7599	4579	0.4381	1839344