--- base_model: Qwen/Qwen3-4B-Instruct-2507 language: - en license: apache-2.0 library_name: transformers pipeline_tag: text-generation tags: - agent - tool-use - alfworld - dbbench - unsloth - agentbench --- # Qwen3-4B AgentBench "023-Jinja-Heuristics" LoRA This repository provides a highly optimized **merged model** fine-tuned from **Qwen/Qwen3-4B-Instruct-2507**. It is specifically engineered to achieve state-of-the-art performance on **AgentBench** (specifically ALFWorld and DBBench) by solving the catastrophic forgetting and format-collision problems inherent in multi-task agent fine-tuning. This repository contains the **fully merged model** (base + LoRA merged). No separate base model loading is needed. ## Key Innovation: Jinja2 Contextual Routing & Heuristics Injection The true power of this model lies not just in its weights, but in its **custom `tokenizer_config.json`**. We completely overrode the default `chat_template` using Jinja2 to act as an "Absolute Defense Shield" and a "Dynamic Heuristics Injector". Depending on the user's prompt, the tokenizer automatically intercepts the input and injects task-specific System Prompts (Cheat Sheets) *just before* inference: ### 1. DB Bench (MySQL) Mode When `MySQL` or `SQL` is detected in the prompt, the model is forced into a DB Agent persona with the following injected rules: - **Error Recovery:** "If you encounter an SQL error (e.g., 'Unknown column'), DO NOT panic. Use `Action: Operation` to execute `DESCRIBE table_name;` and check the correct schema before retrying." - **Loop Prevention:** "Never repeat the exact same invalid SQL." ### 2. ALFWorld (Household) Mode When `household` or `Interact with a` is detected, the model is forced into an ALFWorld Agent persona: - **Format Override:** Completely ignores the evaluation system's trap (`THOUGHT:`/`ACTION:`) and strictly enforces the stable `Think:`/`Act:` format. - **Exploration Logic:** "If an action fails (`Nothing happened`), analyze why in your `Think:` step and choose a DIFFERENT action." - **Efficiency:** "If you search a receptacle and do not find the target object, DO NOT search it again. Move to a different location." ## Training Configuration (The "Golden Ratio") To maximize reasoning capabilities without exceeding the 4B model's capacity, we used a highly curated "Golden Ratio" dataset: - **Dataset:** ALFWorld v5 Trajectories + DBBench Distilled (494 high-quality, noise-free trajectories). - **Method:** LoRA (full precision base) via Unsloth. - **Loss Strategy:** Loss is applied strictly to **all assistant turns** in the multi-turn trajectory, ignoring user/system prompts. **Hyperparameters:** - Max sequence length: 8192 - Epochs: 2 - Learning rate: 1e-6 - LoRA Rank (r): 64 - LoRA Alpha: 128 - Target Modules: `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` ## Usage Because the magic is embedded in the Jinja2 `chat_template`, you **must** use this tokenizer to see the performance gains. ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "your_huggingface_id/your_model_name" # Change this to your actual repo ID # 1. Load the customized tokenizer (CRITICAL) tokenizer = AutoTokenizer.from_pretrained(model_id) # 2. Load merged model directly model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) # 3. Standard Inference (The Jinja2 template handles the routing automatically) messages = [ {"role": "user", "content": "You are a specialized MySQL database agent..."} ] inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ```