初始化项目，由ModelHub XC社区提供模型

Model: beyoru/Qwen3-4B-I-1209 Source: Original Platform
2026-05-23 05:13:17 +08:00
commit 1490e67d91
14 changed files with 152390 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,85 @@
+---
+base_model: beyoru/Qwen3-4B-I-1209
+tags:
+- text-generation-inference
+- transformers
+- qwen3
+- tools
+- agent
+- function calling
+- tool calling
+license: apache-2.0
+language:
+- en
+---
+# Qwen3-4B-I-1209
+[![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/Hert4)
+[![HuggingFace](https://img.shields.io/badge/HuggingFace-FFD21E?style=flat-square&logo=huggingface&logoColor=black)](https://huggingface.co/beyoru)
+[![Buy Me A Coffee](https://img.shields.io/badge/Buy_Me_A_Coffee-A78BFA?style=flat-square&logo=buy-me-a-coffee&logoColor=white)](https://buymeacoffee.com/ductransa0g)
+
+Fine-tuned variant of Qwen3-4B-Instruct-2507, optimized for tool-use and function call generation via reinforcement learning with composite reward signals.
+
+## Overview
+
+| | |
+|---|---|
+| **Base Model** | Qwen/Qwen3-4B-Instruct-2507 |
+| **Training Method** | GRPO (Group Relative Policy Optimization) |
+| **Specialization** | Tool-use, function calling |
+| **License** | Apache 2.0 |
+
+## Training
+
+### Reward Design
+
+The model is trained with three complementary reward functions:
+
+- **Rule-based reward** — Verifies correctness of function names and arguments. Partial credit is awarded for matching argument subsets.
+- **Self-certainty reward** — Encourages confident, well-calibrated predictions.
+- **Tool-call reward** — Validates structural correctness of generated tool calls.
+
+### Configuration
+
+| Parameter | Value |
+|---|---|
+| Optimizer | AdamW |
+| Learning rate | 5e-6 |
+| Scheduler | Cosine with min LR (`min_lr_rate=0.1`) |
+| Generations per prompt | 4 |
+
+## Evaluation
+
+### ACEBench
+
+| Model | Overall Accuracy |
+|---|---|
+| **Qwen3-4B-I-1209 (this model)** | **0.7233** |
+| Qwen3-4B-Instruct-2507 (base) | 0.6350 |
+| Salesforce/Llama-xLAM-2-8b-fc-r | 0.5792 |
+
+> Additional benchmark results will be added as evaluation continues.
+
+## Usage
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model = AutoModelForCausalLM.from_pretrained("beyoru/Qwen3-4B-I-1209")
+tokenizer = AutoTokenizer.from_pretrained("beyoru/Qwen3-4B-I-1209")
+```
+
+## Feedback & Contributions
+
+Feedback on model quality, edge cases, and real-world performance is welcome. Open an issue or reach out via the links below.
+
+
+## Citation
+
+```bibtex
+@misc{qwen3-4b-i-1209,
+  title        = {Qwen3-4B-I-1209: Fine-tuned Qwen3-4B-Instruct with GRPO for Tool-Use and Function Calling},
+  author       = {Beyoru},
+  year         = {2025},
+  howpublished = {\url{https://huggingface.co/beyoru/Qwen3-4B-I-1209}}
+}
+```