Files

ModelHub XC f595d64212 初始化项目，由ModelHub XC社区提供模型

Model: plotMaker/qwen25-7b-sft-merged-v5v6-a50
Source: Original Platform

2026-05-19 13:57:23 +08:00

3.0 KiB

Raw Permalink Blame History

base_model, datasets, language, license, library_name, pipeline_tag, tags

base_model

datasets

language

license

library_name

pipeline_tag

qwen25-7b-sft-merged-v5v6-a50

This repository provides a fully merged model fine-tuned from Qwen2.5-7B-Instruct using QLoRA + Unsloth.

Two SFT models (v5 and v6) were trained independently, then combined via weight interpolation (alpha=0.5). This is a complete model — no adapters or additional weights are needed.

Training Objective

This model is trained to improve multi-turn agent task performance on ALFWorld (household tasks) and DBBench (database operations).

Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to learn environment observation, action selection, tool use, and recovery from errors.

Training Configuration

Base model: Qwen/Qwen2.5-7B-Instruct
Method: QLoRA (4-bit) + Unsloth, merged into base model
Max sequence length: 2048
Epochs: 2
Learning rate: 5e-5
LoRA: r=32, alpha=64
Post-training: weight interpolation of v5 and v6 (alpha=0.5)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "plotMaker/qwen25-7b-sft-merged-v5v6-a50"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

References

Model Soups (Wortsman et al., 2022) — Weight interpolation of fine-tuned models
LoRA (Hu et al., 2021) — Low-Rank Adaptation
NEFTune (Jain et al., 2024) — Noisy embedding fine-tuning
rsLoRA (Kalajdzievski, 2023) — Rank-stabilized LoRA scaling
ALFWorld (Shridhar et al., 2021) — Interactive text-world environments
ReAct (Yao et al., 2023) — Reasoning and acting in LLMs

Sources & Terms (IMPORTANT)

Training data:

u-10bei/sft_alfworld_trajectory_dataset_v2 ~ v5
u-10bei/dbbench_sft_dataset_react ~ v4

Base model: Qwen/Qwen2.5-7B-Instruct

This repository does NOT redistribute the dataset. Users must comply with the dataset license and base model terms.

3.0 KiB Raw Permalink Blame History