89 lines
2.8 KiB
Markdown
89 lines
2.8 KiB
Markdown
|
|
---
|
|||
|
|
language: en
|
|||
|
|
license: apache-2.0
|
|||
|
|
base_model: Nanbeige/Nanbeige4.1-3B
|
|||
|
|
datasets:
|
|||
|
|
- TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets
|
|||
|
|
tags:
|
|||
|
|
- tool-use
|
|||
|
|
- gmail
|
|||
|
|
- function-calling
|
|||
|
|
- sft
|
|||
|
|
- dpo
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Nanbeige4.1-3B — Gmail Tool-Use (SFT + DPO)
|
|||
|
|
|
|||
|
|
Fine-tuned version of [Nanbeige/Nanbeige4.1-3B](https://huggingface.co/Nanbeige/Nanbeige4.1-3B)
|
|||
|
|
for Gmail tool-calling tasks using a two-stage training pipeline.
|
|||
|
|
|
|||
|
|
|
|||
|
|
<div align="center">
|
|||
|
|
<img src="https://images.hdqwalls.com/wallpapers/king-glory-anime-boy-4k-ka.jpg" width="800" alt="Nanbeige Gmail Agent Chains" style="border-radius: 12px; box-shadow: 0 4px 12px rgba(0,0,0,0.2);">
|
|||
|
|
<br><br>
|
|||
|
|
<h1>📧 Nanbeige-4.1-3B Gmail Tool Use Agent</h1>
|
|||
|
|
<p><i>A hyper-aligned 3B parameter agent matching GPT-4o-mini performance inside LangGraph.</i></p>
|
|||
|
|
</div>
|
|||
|
|
|
|||
|
|
<br>
|
|||
|
|
|
|||
|
|
|
|||
|
|
**Training datasets:** [TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets](https://huggingface.co/datasets/TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets)
|
|||
|
|
|
|||
|
|
## Training Pipeline
|
|||
|
|
|
|||
|
|
### Stage 1 — Supervised Fine-Tuning (SFT)
|
|||
|
|
- **Dataset:** 740 multi-turn Gmail agent traces (`sft/traces_chatml_clean.jsonl`)
|
|||
|
|
- **Format:** ChatML with tool_calls (OpenAI function-calling schema)
|
|||
|
|
- **Method:** LoRA r=16, α=32, 7 target modules
|
|||
|
|
- **Result:** loss 0.8464 → 0.1888 · PPL 2.33 → 1.21
|
|||
|
|
|
|||
|
|
### Stage 2 — Direct Preference Optimization (DPO)
|
|||
|
|
- **Dataset:** 3223 preference pairs (`dpo/dpo_dataset.jsonl`) — 3 rejection strategies:
|
|||
|
|
- `wrong_tool` — incorrect tool selected (~34%)
|
|||
|
|
- `missing_args` — required arguments omitted (~32%)
|
|||
|
|
- `bad_answer` — poor final response (~34%)
|
|||
|
|
- **Method:** DPO β=0.1, sigmoid loss, LoRA r=16, `ref_model=None` (PEFT implicit ref)
|
|||
|
|
- **Result:** val_loss=0.000765 · reward accuracy=100% · normalized margin=+0.52
|
|||
|
|
|
|||
|
|
## Supported Tools
|
|||
|
|
|
|||
|
|
| Tool | Description |
|
|||
|
|
|---|---|
|
|||
|
|
| `search_emails` | Search Gmail inbox with filters |
|
|||
|
|
| `read_email` | Read full email content by ID |
|
|||
|
|
| `send_email` | Send a new email |
|
|||
|
|
| `draft_email` | Create a draft |
|
|||
|
|
| `modify_email` | Add/remove labels, mark read/unread |
|
|||
|
|
| `download_attachment` | Download email attachment |
|
|||
|
|
|
|||
|
|
## Usage
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|||
|
|
import torch
|
|||
|
|
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
"TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use",
|
|||
|
|
torch_dtype=torch.bfloat16,
|
|||
|
|
trust_remote_code=True,
|
|||
|
|
)
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained(
|
|||
|
|
"TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use",
|
|||
|
|
trust_remote_code=True,
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Training Details
|
|||
|
|
|
|||
|
|
| Parameter | Value |
|
|||
|
|
|---|---|
|
|||
|
|
| Base model | Nanbeige/Nanbeige4.1-3B |
|
|||
|
|
| SFT LoRA rank | 16 |
|
|||
|
|
| DPO LoRA rank | 16 |
|
|||
|
|
| DPO β | 0.1 |
|
|||
|
|
| Max length | 2682 tokens |
|
|||
|
|
| GPU | 1× RTX 4090 24GB |
|
|||
|
|
| Framework | TRL 0.22 · Transformers 4.57 · PEFT 0.18 |
|