63 lines
1.7 KiB
Markdown
63 lines
1.7 KiB
Markdown
|
|
---
|
||
|
|
base_model: Qwen/Qwen2.5-7B-Instruct
|
||
|
|
datasets:
|
||
|
|
- u-10bei/sft_alfworld_trajectory_dataset_v5
|
||
|
|
- ShogoMu/dbbench_u-10bei_sft_dataset_modified_v2
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
license: apache-2.0
|
||
|
|
library_name: transformers
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
tags:
|
||
|
|
- agent
|
||
|
|
- tool-use
|
||
|
|
- alfworld
|
||
|
|
- dbbench
|
||
|
|
---
|
||
|
|
|
||
|
|
# qwen25_7b_lora_agentbench_v6_e4
|
||
|
|
|
||
|
|
This repository provides a **merged model** fine-tuned from
|
||
|
|
**Qwen/Qwen2.5-7B-Instruct**. The fine-tuning was performed using **LoRA + Unsloth** and the resulting adapter has been merged back into the base model weights.
|
||
|
|
|
||
|
|
This repository contains **full model weights**, making it ready for inference
|
||
|
|
without the need to load a separate adapter.
|
||
|
|
|
||
|
|
## Training Objective
|
||
|
|
|
||
|
|
This model is optimized for **multi-turn agent tasks**, specifically for
|
||
|
|
ALFWorld (household navigation/interaction) and DBBench (database operations).
|
||
|
|
|
||
|
|
The training process applied loss to **all assistant turns** in the multi-turn
|
||
|
|
trajectories, allowing the model to learn not just final answers, but also
|
||
|
|
intermediate reasoning (Thought), environment observation processing,
|
||
|
|
action selection, and error recovery.
|
||
|
|
|
||
|
|
## Training Configuration
|
||
|
|
|
||
|
|
- **Base model:** Qwen/Qwen2.5-7B-Instruct
|
||
|
|
- **Method:** LoRA (merged post-training)
|
||
|
|
- **Max sequence length:** 2048
|
||
|
|
- **Epochs:** 4
|
||
|
|
- **Learning rate:** 2e-06
|
||
|
|
- **LoRA Parameters:** r=64, alpha=128
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
This model can be loaded using the standard `transformers` library or
|
||
|
|
deployed with `vLLM` (recommended for evaluation).
|
||
|
|
|
||
|
|
### Transformers
|
||
|
|
```python
|
||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||
|
|
import torch
|
||
|
|
|
||
|
|
model_id = "your_hf_id/your_repo_name"
|
||
|
|
|
||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
||
|
|
model_id,
|
||
|
|
torch_dtype=torch.bfloat16,
|
||
|
|
device_map="auto",
|
||
|
|
)
|