language, license, base_model, datasets, tags, pipeline_tag
| language |
license |
base_model |
datasets |
tags |
pipeline_tag |
| en |
apache-2.0 |
Nanbeige/Nanbeige4.1-3B |
| TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets |
|
| tool-use |
| gmail |
| function-calling |
| sft |
| dpo |
|
text-generation |
Nanbeige4.1-3B — Gmail Tool-Use (SFT + DPO)
Fine-tuned version of Nanbeige/Nanbeige4.1-3B
for Gmail tool-calling tasks using a two-stage training pipeline.
📧 Nanbeige-4.1-3B Gmail Tool Use Agent
A hyper-aligned 3B parameter agent matching GPT-4o-mini performance inside LangGraph.
Training datasets: TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets
Training Pipeline
Stage 1 — Supervised Fine-Tuning (SFT)
- Dataset: 740 multi-turn Gmail agent traces (
sft/traces_chatml_clean.jsonl)
- Format: ChatML with tool_calls (OpenAI function-calling schema)
- Method: LoRA r=16, α=32, 7 target modules
- Result: loss 0.8464 → 0.1888 · PPL 2.33 → 1.21
Stage 2 — Direct Preference Optimization (DPO)
- Dataset: 3223 preference pairs (
dpo/dpo_dataset.jsonl) — 3 rejection strategies:
wrong_tool — incorrect tool selected (~34%)
missing_args — required arguments omitted (~32%)
bad_answer — poor final response (~34%)
- Method: DPO β=0.1, sigmoid loss, LoRA r=16,
ref_model=None (PEFT implicit ref)
- Result: val_loss=0.000765 · reward accuracy=100% · normalized margin=+0.52
Supported Tools
| Tool |
Description |
search_emails |
Search Gmail inbox with filters |
read_email |
Read full email content by ID |
send_email |
Send a new email |
draft_email |
Create a draft |
modify_email |
Add/remove labels, mark read/unread |
download_attachment |
Download email attachment |
Usage
Training Details
| Parameter |
Value |
| Base model |
Nanbeige/Nanbeige4.1-3B |
| SFT LoRA rank |
16 |
| DPO LoRA rank |
16 |
| DPO β |
0.1 |
| Max length |
2682 tokens |
| GPU |
1× RTX 4090 24GB |
| Framework |
TRL 0.22 · Transformers 4.57 · PEFT 0.18 |