---
base_model: Qwen/Qwen3-4B-Instruct-2507
language:
- en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- agent
- tool-use
- alfworld
- dbbench
- unsloth
- agentbench
---

# Qwen3-4B AgentBench "023-Jinja-Heuristics" LoRA

This repository provides a highly optimized **merged model** fine-tuned from **Qwen/Qwen3-4B-Instruct-2507**.
It is specifically engineered to achieve state-of-the-art performance on **AgentBench** (specifically ALFWorld and DBBench) by solving the catastrophic forgetting and format-collision problems inherent in multi-task agent fine-tuning.

This repository contains the **fully merged model** (base + LoRA merged). No separate base model loading is needed.

## Key Innovation: Jinja2 Contextual Routing & Heuristics Injection

The true power of this model lies not just in its weights, but in its **custom `tokenizer_config.json`**.
We completely overrode the default `chat_template` using Jinja2 to act as an "Absolute Defense Shield" and a "Dynamic Heuristics Injector".

Depending on the user's prompt, the tokenizer automatically intercepts the input and injects task-specific System Prompts (Cheat Sheets) *just before* inference:

### 1. DB Bench (MySQL) Mode
When `MySQL` or `SQL` is detected in the prompt, the model is forced into a DB Agent persona with the following injected rules:
- **Error Recovery:** "If you encounter an SQL error (e.g., 'Unknown column'), DO NOT panic. Use `Action: Operation` to execute `DESCRIBE table_name;` and check the correct schema before retrying."
- **Loop Prevention:** "Never repeat the exact same invalid SQL."

### 2. ALFWorld (Household) Mode
When `household` or `Interact with a` is detected, the model is forced into an ALFWorld Agent persona:
- **Format Override:** Completely ignores the evaluation system's trap (`THOUGHT:`/`ACTION:`) and strictly enforces the stable `Think:`/`Act:` format.
- **Exploration Logic:** "If an action fails (`Nothing happened`), analyze why in your `Think:` step and choose a DIFFERENT action."
- **Efficiency:** "If you search a receptacle and do not find the target object, DO NOT search it again. Move to a different location."

## Training Configuration (The "Golden Ratio")

To maximize reasoning capabilities without exceeding the 4B model's capacity, we used a highly curated "Golden Ratio" dataset:
- **Dataset:** ALFWorld v5 Trajectories + DBBench Distilled (494 high-quality, noise-free trajectories).
- **Method:** LoRA (full precision base) via Unsloth.
- **Loss Strategy:** Loss is applied strictly to **all assistant turns** in the multi-turn trajectory, ignoring user/system prompts.

**Hyperparameters:**
- Max sequence length: 8192
- Epochs: 2
- Learning rate: 1e-6
- LoRA Rank (r): 64
- LoRA Alpha: 128
- Target Modules: `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`

## Usage

Because the magic is embedded in the Jinja2 `chat_template`, you **must** use this tokenizer to see the performance gains.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "your_huggingface_id/your_model_name"  # Change this to your actual repo ID

# 1. Load the customized tokenizer (CRITICAL)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# 2. Load merged model directly
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# 3. Standard Inference (The Jinja2 template handles the routing automatically)
messages = [
    {"role": "user", "content": "You are a specialized MySQL database agent..."}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```