初始化项目,由ModelHub XC社区提供模型

Model: kamaboko2007/llm_advance_024_enhanced_rules
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-04 14:30:35 +08:00
commit fba48b555d
14 changed files with 152369 additions and 0 deletions

84
README.md Normal file
View File

@@ -0,0 +1,84 @@
---
base_model: Qwen/Qwen3-4B-Instruct-2507
language:
- en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- agent
- tool-use
- alfworld
- dbbench
- unsloth
- agentbench
---
# Qwen3-4B AgentBench "023-Jinja-Heuristics" LoRA
This repository provides a highly optimized **merged model** fine-tuned from **Qwen/Qwen3-4B-Instruct-2507**.
It is specifically engineered to achieve state-of-the-art performance on **AgentBench** (specifically ALFWorld and DBBench) by solving the catastrophic forgetting and format-collision problems inherent in multi-task agent fine-tuning.
This repository contains the **fully merged model** (base + LoRA merged). No separate base model loading is needed.
## Key Innovation: Jinja2 Contextual Routing & Heuristics Injection
The true power of this model lies not just in its weights, but in its **custom `tokenizer_config.json`**.
We completely overrode the default `chat_template` using Jinja2 to act as an "Absolute Defense Shield" and a "Dynamic Heuristics Injector".
Depending on the user's prompt, the tokenizer automatically intercepts the input and injects task-specific System Prompts (Cheat Sheets) *just before* inference:
### 1. DB Bench (MySQL) Mode
When `MySQL` or `SQL` is detected in the prompt, the model is forced into a DB Agent persona with the following injected rules:
- **Error Recovery:** "If you encounter an SQL error (e.g., 'Unknown column'), DO NOT panic. Use `Action: Operation` to execute `DESCRIBE table_name;` and check the correct schema before retrying."
- **Loop Prevention:** "Never repeat the exact same invalid SQL."
### 2. ALFWorld (Household) Mode
When `household` or `Interact with a` is detected, the model is forced into an ALFWorld Agent persona:
- **Format Override:** Completely ignores the evaluation system's trap (`THOUGHT:`/`ACTION:`) and strictly enforces the stable `Think:`/`Act:` format.
- **Exploration Logic:** "If an action fails (`Nothing happened`), analyze why in your `Think:` step and choose a DIFFERENT action."
- **Efficiency:** "If you search a receptacle and do not find the target object, DO NOT search it again. Move to a different location."
## Training Configuration (The "Golden Ratio")
To maximize reasoning capabilities without exceeding the 4B model's capacity, we used a highly curated "Golden Ratio" dataset:
- **Dataset:** ALFWorld v5 Trajectories + DBBench Distilled (494 high-quality, noise-free trajectories).
- **Method:** LoRA (full precision base) via Unsloth.
- **Loss Strategy:** Loss is applied strictly to **all assistant turns** in the multi-turn trajectory, ignoring user/system prompts.
**Hyperparameters:**
- Max sequence length: 8192
- Epochs: 2
- Learning rate: 1e-6
- LoRA Rank (r): 64
- LoRA Alpha: 128
- Target Modules: `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
## Usage
Because the magic is embedded in the Jinja2 `chat_template`, you **must** use this tokenizer to see the performance gains.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "your_huggingface_id/your_model_name" # Change this to your actual repo ID
# 1. Load the customized tokenizer (CRITICAL)
tokenizer = AutoTokenizer.from_pretrained(model_id)
# 2. Load merged model directly
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
# 3. Standard Inference (The Jinja2 template handles the routing automatically)
messages = [
{"role": "user", "content": "You are a specialized MySQL database agent..."}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```