初始化项目,由ModelHub XC社区提供模型
Model: kamaboko2007/llm_advance_024_enhanced_rules Source: Original Platform
This commit is contained in:
84
README.md
Normal file
84
README.md
Normal file
@@ -0,0 +1,84 @@
|
||||
---
|
||||
base_model: Qwen/Qwen3-4B-Instruct-2507
|
||||
language:
|
||||
- en
|
||||
license: apache-2.0
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- agent
|
||||
- tool-use
|
||||
- alfworld
|
||||
- dbbench
|
||||
- unsloth
|
||||
- agentbench
|
||||
---
|
||||
|
||||
# Qwen3-4B AgentBench "023-Jinja-Heuristics" LoRA
|
||||
|
||||
This repository provides a highly optimized **merged model** fine-tuned from **Qwen/Qwen3-4B-Instruct-2507**.
|
||||
It is specifically engineered to achieve state-of-the-art performance on **AgentBench** (specifically ALFWorld and DBBench) by solving the catastrophic forgetting and format-collision problems inherent in multi-task agent fine-tuning.
|
||||
|
||||
This repository contains the **fully merged model** (base + LoRA merged). No separate base model loading is needed.
|
||||
|
||||
## Key Innovation: Jinja2 Contextual Routing & Heuristics Injection
|
||||
|
||||
The true power of this model lies not just in its weights, but in its **custom `tokenizer_config.json`**.
|
||||
We completely overrode the default `chat_template` using Jinja2 to act as an "Absolute Defense Shield" and a "Dynamic Heuristics Injector".
|
||||
|
||||
Depending on the user's prompt, the tokenizer automatically intercepts the input and injects task-specific System Prompts (Cheat Sheets) *just before* inference:
|
||||
|
||||
### 1. DB Bench (MySQL) Mode
|
||||
When `MySQL` or `SQL` is detected in the prompt, the model is forced into a DB Agent persona with the following injected rules:
|
||||
- **Error Recovery:** "If you encounter an SQL error (e.g., 'Unknown column'), DO NOT panic. Use `Action: Operation` to execute `DESCRIBE table_name;` and check the correct schema before retrying."
|
||||
- **Loop Prevention:** "Never repeat the exact same invalid SQL."
|
||||
|
||||
### 2. ALFWorld (Household) Mode
|
||||
When `household` or `Interact with a` is detected, the model is forced into an ALFWorld Agent persona:
|
||||
- **Format Override:** Completely ignores the evaluation system's trap (`THOUGHT:`/`ACTION:`) and strictly enforces the stable `Think:`/`Act:` format.
|
||||
- **Exploration Logic:** "If an action fails (`Nothing happened`), analyze why in your `Think:` step and choose a DIFFERENT action."
|
||||
- **Efficiency:** "If you search a receptacle and do not find the target object, DO NOT search it again. Move to a different location."
|
||||
|
||||
## Training Configuration (The "Golden Ratio")
|
||||
|
||||
To maximize reasoning capabilities without exceeding the 4B model's capacity, we used a highly curated "Golden Ratio" dataset:
|
||||
- **Dataset:** ALFWorld v5 Trajectories + DBBench Distilled (494 high-quality, noise-free trajectories).
|
||||
- **Method:** LoRA (full precision base) via Unsloth.
|
||||
- **Loss Strategy:** Loss is applied strictly to **all assistant turns** in the multi-turn trajectory, ignoring user/system prompts.
|
||||
|
||||
**Hyperparameters:**
|
||||
- Max sequence length: 8192
|
||||
- Epochs: 2
|
||||
- Learning rate: 1e-6
|
||||
- LoRA Rank (r): 64
|
||||
- LoRA Alpha: 128
|
||||
- Target Modules: `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
|
||||
|
||||
## Usage
|
||||
|
||||
Because the magic is embedded in the Jinja2 `chat_template`, you **must** use this tokenizer to see the performance gains.
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
import torch
|
||||
|
||||
model_id = "your_huggingface_id/your_model_name" # Change this to your actual repo ID
|
||||
|
||||
# 1. Load the customized tokenizer (CRITICAL)
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
|
||||
# 2. Load merged model directly
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_id,
|
||||
torch_dtype=torch.bfloat16,
|
||||
device_map="auto",
|
||||
)
|
||||
|
||||
# 3. Standard Inference (The Jinja2 template handles the routing automatically)
|
||||
messages = [
|
||||
{"role": "user", "content": "You are a specialized MySQL database agent..."}
|
||||
]
|
||||
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
|
||||
outputs = model.generate(**inputs, max_new_tokens=512)
|
||||
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||
```
|
||||
Reference in New Issue
Block a user