--- base_model: beyoru/Qwen3-4B-I-1209 tags: - text-generation-inference - transformers - qwen3 - tools - agent - function calling - tool calling license: apache-2.0 language: - en --- # Qwen3-4B-I-1209 [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/Hert4) [![HuggingFace](https://img.shields.io/badge/HuggingFace-FFD21E?style=flat-square&logo=huggingface&logoColor=black)](https://huggingface.co/beyoru) [![Buy Me A Coffee](https://img.shields.io/badge/Buy_Me_A_Coffee-A78BFA?style=flat-square&logo=buy-me-a-coffee&logoColor=white)](https://buymeacoffee.com/ductransa0g) Fine-tuned variant of Qwen3-4B-Instruct-2507, optimized for tool-use and function call generation via reinforcement learning with composite reward signals. ## Overview | | | |---|---| | **Base Model** | Qwen/Qwen3-4B-Instruct-2507 | | **Training Method** | GRPO (Group Relative Policy Optimization) | | **Specialization** | Tool-use, function calling | | **License** | Apache 2.0 | ## Training ### Reward Design The model is trained with three complementary reward functions: - **Rule-based reward** — Verifies correctness of function names and arguments. Partial credit is awarded for matching argument subsets. - **Self-certainty reward** — Encourages confident, well-calibrated predictions. - **Tool-call reward** — Validates structural correctness of generated tool calls. ### Configuration | Parameter | Value | |---|---| | Optimizer | AdamW | | Learning rate | 5e-6 | | Scheduler | Cosine with min LR (`min_lr_rate=0.1`) | | Generations per prompt | 4 | ## Evaluation ### ACEBench | Model | Overall Accuracy | |---|---| | **Qwen3-4B-I-1209 (this model)** | **0.7233** | | Qwen3-4B-Instruct-2507 (base) | 0.6350 | | Salesforce/Llama-xLAM-2-8b-fc-r | 0.5792 | > Additional benchmark results will be added as evaluation continues. ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("beyoru/Qwen3-4B-I-1209") tokenizer = AutoTokenizer.from_pretrained("beyoru/Qwen3-4B-I-1209") ``` ## Feedback & Contributions Feedback on model quality, edge cases, and real-world performance is welcome. Open an issue or reach out via the links below. ## Citation ```bibtex @misc{qwen3-4b-i-1209, title = {Qwen3-4B-I-1209: Fine-tuned Qwen3-4B-Instruct with GRPO for Tool-Use and Function Calling}, author = {Beyoru}, year = {2025}, howpublished = {\url{https://huggingface.co/beyoru/Qwen3-4B-I-1209}} } ```