mistral-7B-v0.3-finetuned/README.md at 5673709be0318fd1879a7b63d312389a0b315f91

Files

ModelHub XC 5673709be0 初始化项目，由ModelHub XC社区提供模型

Model: formalmathatepfl/mistral-7B-v0.3-finetuned
Source: Original Platform

2026-04-24 16:45:01 +08:00

library_name, license, base_model, tags, model-index

library_name

license

base_model

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 32
total_eval_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1.0

Training Loss	Epoch	Step	Validation Loss
0.0817	0.1633	1000	0.0849
0.0766	0.3266	2000	0.0762
0.0628	0.4900	3000	0.0654
0.0594	0.6533	4000	0.0602
0.0547	0.8166	5000	0.0559
0.0537	0.9799	6000	0.0542