Files
ModelHub XC 64d1875cc3 初始化项目,由ModelHub XC社区提供模型
Model: TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use
Source: Original Platform
2026-05-18 04:00:36 +08:00

89 lines
2.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
language: en
license: apache-2.0
base_model: Nanbeige/Nanbeige4.1-3B
datasets:
- TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets
tags:
- tool-use
- gmail
- function-calling
- sft
- dpo
pipeline_tag: text-generation
---
# Nanbeige4.1-3B — Gmail Tool-Use (SFT + DPO)
Fine-tuned version of [Nanbeige/Nanbeige4.1-3B](https://huggingface.co/Nanbeige/Nanbeige4.1-3B)
for Gmail tool-calling tasks using a two-stage training pipeline.
<div align="center">
<img src="https://images.hdqwalls.com/wallpapers/king-glory-anime-boy-4k-ka.jpg" width="800" alt="Nanbeige Gmail Agent Chains" style="border-radius: 12px; box-shadow: 0 4px 12px rgba(0,0,0,0.2);">
<br><br>
<h1>📧 Nanbeige-4.1-3B Gmail Tool Use Agent</h1>
<p><i>A hyper-aligned 3B parameter agent matching GPT-4o-mini performance inside LangGraph.</i></p>
</div>
<br>
**Training datasets:** [TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets](https://huggingface.co/datasets/TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets)
## Training Pipeline
### Stage 1 — Supervised Fine-Tuning (SFT)
- **Dataset:** 740 multi-turn Gmail agent traces (`sft/traces_chatml_clean.jsonl`)
- **Format:** ChatML with tool_calls (OpenAI function-calling schema)
- **Method:** LoRA r=16, α=32, 7 target modules
- **Result:** loss 0.8464 → 0.1888 · PPL 2.33 → 1.21
### Stage 2 — Direct Preference Optimization (DPO)
- **Dataset:** 3223 preference pairs (`dpo/dpo_dataset.jsonl`) — 3 rejection strategies:
- `wrong_tool` — incorrect tool selected (~34%)
- `missing_args` — required arguments omitted (~32%)
- `bad_answer` — poor final response (~34%)
- **Method:** DPO β=0.1, sigmoid loss, LoRA r=16, `ref_model=None` (PEFT implicit ref)
- **Result:** val_loss=0.000765 · reward accuracy=100% · normalized margin=+0.52
## Supported Tools
| Tool | Description |
|---|---|
| `search_emails` | Search Gmail inbox with filters |
| `read_email` | Read full email content by ID |
| `send_email` | Send a new email |
| `draft_email` | Create a draft |
| `modify_email` | Add/remove labels, mark read/unread |
| `download_attachment` | Download email attachment |
## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained(
"TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use",
trust_remote_code=True,
)
```
## Training Details
| Parameter | Value |
|---|---|
| Base model | Nanbeige/Nanbeige4.1-3B |
| SFT LoRA rank | 16 |
| DPO LoRA rank | 16 |
| DPO β | 0.1 |
| Max length | 2682 tokens |
| GPU | 1× RTX 4090 24GB |
| Framework | TRL 0.22 · Transformers 4.57 · PEFT 0.18 |