--- library_name: transformers license: apache-2.0 base_model: Tifin-Sage/magnifi-classifier-01-05-search-agent-3-epochs-3k-unknown-errors tags: - axolotl - generated_from_trainer datasets: - Tifin-Sage/magnifi-module-classifier-04-17-relabelled-upsampled model-index: - name: magnifi-module-classifier-04-17-relabelled-upsampled results: [] --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.16.0.dev0` ```yaml base_model: Tifin-Sage/magnifi-classifier-01-05-search-agent-3-epochs-3k-unknown-errors hub_model_id: Tifin-Sage/magnifi-module-classifier-04-17-relabelled-upsampled load_in_8bit: false load_in_4bit: false strict: false chat_template: qwen3 datasets: - path: Tifin-Sage/magnifi-module-classifier-04-17-relabelled-upsampled type: chat_template split: train field_messages: messages message_property_mappings: role: role content: content val_set_size: 0.1 output_dir: /workspace/data/outputs/qwen3-4B/fft_magnifi-module-classifier-04-17-relabelled-upsampled/ dataset_prepared_path: /workspace/data/datasets_prepared/magnifi-module-classifier-04-17-relabelled-upsampled sequence_len: 16000 sample_packing: true eval_sample_packing: true wandb_project: sage-classifier wandb_entity: wandb_watch: wandb_name: magnifi-module-classifier-04-17-relabelled-upsampled wandb_log_model: gradient_accumulation_steps: 1 micro_batch_size: 1 num_epochs: 2 optimizer: adamw_torch_fused lr_scheduler: cosine learning_rate: 2e-5 bf16: auto tf32: true resume_from_checkpoint: logging_steps: 1 evals_per_epoch: 2 saves_per_epoch: 1 warmup_ratio: 0.1 weight_decay: 0.0 fsdp: - full_shard - auto_wrap fsdp_config: fsdp_version: 2 fsdp_offload_params: false fsdp_cpu_ram_efficient_loading: true fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer fsdp_state_dict_type: FULL_STATE_DICT fsdp_sharding_strategy: FULL_SHARD fsdp_reshard_after_forward: true fsdp_activation_checkpointing: true special_tokens: ```

# magnifi-module-classifier-04-17-relabelled-upsampled This model is a fine-tuned version of [Tifin-Sage/magnifi-classifier-01-05-search-agent-3-epochs-3k-unknown-errors](https://huggingface.co/Tifin-Sage/magnifi-classifier-01-05-search-agent-3-epochs-3k-unknown-errors) on the Tifin-Sage/magnifi-module-classifier-04-17-relabelled-upsampled dataset. It achieves the following results on the evaluation set: - Loss: 0.2227 - Ppl: 1.2494 - Memory/max Active (gib): 34.91 - Memory/max Allocated (gib): 34.91 - Memory/device Reserved (gib): 57.25 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - total_train_batch_size: 2 - total_eval_batch_size: 2 - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 47 - training_steps: 478 ### Training results | Training Loss | Epoch | Step | Validation Loss | Ppl | Active (gib) | Allocated (gib) | Reserved (gib) | |:-------------:|:------:|:----:|:---------------:|:------:|:------------:|:---------------:|:--------------:| | No log | 0 | 0 | 0.2049 | 1.2275 | 27.41 | 27.41 | 30.62 | | 0.2339 | 0.5 | 120 | 0.2288 | 1.2571 | 34.91 | 34.91 | 59.04 | | 0.2290 | 1.0 | 240 | 0.2166 | 1.2419 | 34.91 | 34.91 | 57.54 | | 0.0898 | 1.5 | 360 | 0.2251 | 1.2524 | 34.91 | 34.91 | 57.54 | | 0.1331 | 1.9917 | 478 | 0.2227 | 1.2494 | 34.91 | 34.91 | 57.25 | ### Framework versions - Transformers 5.5.4 - Pytorch 2.10.0+cu128 - Datasets 4.8.4 - Tokenizers 0.22.2