--- library_name: transformers license: other base_model: LiquidAI/LFM2-350M-ENJP-MT tags: - generated_from_trainer datasets: - chotto-20251010.sft.jsonl model-index: - name: data/outputs/shisa-v2.1c-lfm2-350m-sft2 results: [] --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.13.0.dev0` ```yaml base_model: LiquidAI/LFM2-350M-ENJP-MT chunked_cross_entropy: true eot_tokens: - "<|im_end|>" datasets: - path: chotto-20251010.sft.jsonl type: chat_template field_messages: conversations message_property_mappings: role: role content: content roles: system: - system assistant: - assistant - gpt - model user: - user - human roles_to_train: ["assistant"] dataset_prepared_path: last_run_prepared_sft output_dir: /data/outputs/shisa-v2.1c-lfm2-350m-sft2 sequence_len: 8192 sample_packing: true flash_attention: true pad_to_sequence_len: true neftune_noise_alpha: 5 use_wandb: true wandb_entity: augmxnt wandb_project: liquid-hackathon-tokyo wandb_name: "shisa-v2.1c-lfm2-350m-sft2" # GBS = 128 / 8 GPU / 16 MBS / 1 GAS gradient_accumulation_steps: 1 micro_batch_size: 16 num_epochs: 4 optimizer: adamw_torch_4bit lr_scheduler: cosine learning_rate: 6e-5 # 4.78 @ GBS=128 train_on_inputs: false group_by_length: false bf16: true tf32: false gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false logging_steps: 1 warmup_ratio: 0.03 saves_per_epoch: 1 deepspeed: zero3_bf16.json weight_decay: 1e-4 ```

# data/outputs/shisa-v2.1c-lfm2-350m-sft2 This model is a fine-tuned version of [LiquidAI/LFM2-350M-ENJP-MT](https://huggingface.co/LiquidAI/LFM2-350M-ENJP-MT) on the chotto-20251010.sft.jsonl dataset. ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 6e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - total_train_batch_size: 128 - total_eval_batch_size: 128 - optimizer: Use OptimizerNames.ADAMW_TORCH_4BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 69 - training_steps: 2332 ### Training results ### Framework versions - Transformers 4.57.0 - Pytorch 2.8.0+rocm6.4 - Datasets 4.1.1 - Tokenizers 0.22.1