--- base_model: Qwen/Qwen2.5-7B-Instruct datasets: - u-10bei/sft_alfworld_trajectory_dataset_v5 - u-10bei/dbbench_sft_dataset_react_v4 language: - en license: apache-2.0 pipeline_tag: text-generation tags: - lora - merged - agent - tool-use - alfworld - dbbench --- # Qwen2.5-7B-Instruct-SDFT-2ep-fp16 This repository provides a fine-tuned model based on **Qwen/Qwen2.5-7B-Instruct**. The model was initially trained using **LoRA + Unsloth** and has been **merged with the base model**. The weights in this repository are saved in **fp16** format, so you can load and use it directly without needing to load the base model and adapter separately. ## Training Objective This model is trained to improve **multi-turn agent task performance** on ALFWorld (household tasks) and DBBench (database operations). Loss is applied to **all assistant turns** in the multi-turn trajectory, enabling the model to learn environment observation, action selection, tool use, and recovery from errors. ## Training Configuration - Base model: Qwen/Qwen2.5-7B-Instruct - Method: LoRA (merged into base model) - Precision: fp16 - **Experimental Methods:** SDFT & Epiplexity *(Note: Implementation is still a work in progress)* - Max sequence length: 4096 - Epochs: 2 - Learning rate: 2e-06 - LoRA: r=64, alpha=128 ## Experimental Features This version incorporates experimental training techniques, specifically **SDFT** and **Epiplexity**. However, the integration of these methods is not yet fully completed. We are still evaluating their impact on the model's reasoning capabilities and plan to refine them in future updates. ## Usage You can load this model directly using `AutoModelForCausalLM`. ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "aolans/Qwen2.5-7B-Instruct-SDFT-2ep-fp16" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.float16, device_map="auto", ) ``` ## References The experimental training methods (SDFT and Epiplexity) applied in this model are based on the following research: * [Self-Distillation Enables Continual Learning](https://arxiv.org/abs/2601.19897) * [From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence](https://arxiv.org/abs/2601.03220) ## Sources & Terms (IMPORTANT) Training data: u-10bei/sft_alfworld_trajectory_dataset_v5, u-10bei/dbbench_sft_dataset_react_v4 Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.