Files

ModelHub XC 8a72afe94c 初始化项目，由ModelHub XC社区提供模型

Model: koguma-ai/dbbench-combined-baseline0301
Source: Original Platform

2026-06-05 01:31:17 +08:00

2.3 KiB

Raw Blame History

base_model, datasets, language, license, pipeline_tag, tags

base_model

datasets

language

license

pipeline_tag

Qwen2.5-7B DB Bench Combined SFT (v1-v4)

This repository provides a merged full-weight model fine-tuned from Qwen2.5-7B-Instruct using LoRA + Unsloth, then merged to 16bit.

Training Objective

This model is trained to improve DB Bench (database operation) performance on the AgentBench evaluation benchmark. ALFWorld performance relies entirely on the base model's inherent capability (no ALFWorld training data used).

Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to learn SQL generation, action selection, and error recovery.

Training Data

DB Bench v1 (u-10bei/dbbench_sft_dataset_react): ~750 samples
DB Bench v2 (u-10bei/dbbench_sft_dataset_react_v2): ~750 samples
DB Bench v3 (u-10bei/dbbench_sft_dataset_react_v3): ~750 samples
DB Bench v4 (u-10bei/dbbench_sft_dataset_react_v4): ~750 samples
Total: ~3,000 samples
ALFWorld data intentionally excluded to preserve base model performance

Training Configuration

Base model: Qwen/Qwen2.5-7B-Instruct
Method: LoRA → merged to 16bit
Max sequence length: 2048
Epochs: 2
Learning rate: 2e-6
LoRA: r=64, alpha=128
Batch size: 2, Gradient accumulation: 4 (effective batch 8)
Optimizer: AdamW (cosine scheduler)
Framework: Unsloth

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "koguma-ai/dbbench-combined-baseline0301"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

Sources & Terms

Training data: u-10bei/dbbench_sft_dataset_react (v1-v4)

Dataset License: Apache-2.0. Users must comply with the Apache-2.0 license and the base model's original terms of use.

Limitations

Optimized for DB Bench tasks only
ALFWorld performance relies on base model capability
Weak categories: aggregation-MAX (16.7%), INSERT (33.3%)

2.3 KiB Raw Blame History