初始化项目，由ModelHub XC社区提供模型

Model: TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use Source: Original Platform
2026-05-18 04:00:36 +08:00
commit 64d1875cc3
13 changed files with 618 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,88 @@
+---
+language: en
+license: apache-2.0
+base_model: Nanbeige/Nanbeige4.1-3B
+datasets:
+  - TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets
+tags:
+  - tool-use
+  - gmail
+  - function-calling
+  - sft
+  - dpo
+pipeline_tag: text-generation
+---
+
+# Nanbeige4.1-3B — Gmail Tool-Use (SFT + DPO)
+
+Fine-tuned version of [Nanbeige/Nanbeige4.1-3B](https://huggingface.co/Nanbeige/Nanbeige4.1-3B)
+for Gmail tool-calling tasks using a two-stage training pipeline.
+
+
+<div align="center">
+  <img src="https://images.hdqwalls.com/wallpapers/king-glory-anime-boy-4k-ka.jpg" width="800" alt="Nanbeige Gmail Agent Chains" style="border-radius: 12px; box-shadow: 0 4px 12px rgba(0,0,0,0.2);">
+  <br><br>
+  <h1>📧 Nanbeige-4.1-3B Gmail Tool Use Agent</h1>
+  <p><i>A hyper-aligned 3B parameter agent matching GPT-4o-mini performance inside LangGraph.</i></p>
+</div>
+
+<br>
+
+
+**Training datasets:** [TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets](https://huggingface.co/datasets/TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets)
+
+## Training Pipeline
+
+### Stage 1 — Supervised Fine-Tuning (SFT)
+- **Dataset:** 740 multi-turn Gmail agent traces (`sft/traces_chatml_clean.jsonl`)
+- **Format:** ChatML with tool_calls (OpenAI function-calling schema)
+- **Method:** LoRA r=16, α=32, 7 target modules
+- **Result:** loss 0.8464 → 0.1888 · PPL 2.33 → 1.21
+
+### Stage 2 — Direct Preference Optimization (DPO)
+- **Dataset:** 3223 preference pairs (`dpo/dpo_dataset.jsonl`) — 3 rejection strategies:
+  - `wrong_tool` — incorrect tool selected (~34%)
+  - `missing_args` — required arguments omitted (~32%)
+  - `bad_answer` — poor final response (~34%)
+- **Method:** DPO β=0.1, sigmoid loss, LoRA r=16, `ref_model=None` (PEFT implicit ref)
+- **Result:** val_loss=0.000765 · reward accuracy=100% · normalized margin=+0.52
+
+## Supported Tools
+
+| Tool | Description |
+|---|---|
+| `search_emails` | Search Gmail inbox with filters |
+| `read_email` | Read full email content by ID |
+| `send_email` | Send a new email |
+| `draft_email` | Create a draft |
+| `modify_email` | Add/remove labels, mark read/unread |
+| `download_attachment` | Download email attachment |
+
+## Usage
+
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+
+model = AutoModelForCausalLM.from_pretrained(
+    "TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use",
+    torch_dtype=torch.bfloat16,
+    trust_remote_code=True,
+)
+tokenizer = AutoTokenizer.from_pretrained(
+    "TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use",
+    trust_remote_code=True,
+)
+```
+
+## Training Details
+
+| Parameter | Value |
+|---|---|
+| Base model | Nanbeige/Nanbeige4.1-3B |
+| SFT LoRA rank | 16 |
+| DPO LoRA rank | 16 |
+| DPO β | 0.1 |
+| Max length | 2682 tokens |
+| GPU | 1× RTX 4090 24GB |
+| Framework | TRL 0.22 · Transformers 4.57 · PEFT 0.18 |