Fine-tuned variant of Qwen3-4B-Instruct-2507, optimized for tool-use and function call generation via reinforcement learning with composite reward signals.
Overview
Base Model
Qwen/Qwen3-4B-Instruct-2507
Training Method
GRPO (Group Relative Policy Optimization)
Specialization
Tool-use, function calling
License
Apache 2.0
Training
Reward Design
The model is trained with three complementary reward functions:
Rule-based reward — Verifies correctness of function names and arguments. Partial credit is awarded for matching argument subsets.
Feedback on model quality, edge cases, and real-world performance is welcome. Open an issue or reach out via the links below.
Citation
@misc{qwen3-4b-i-1209,title={Qwen3-4B-I-1209: Fine-tuned Qwen3-4B-Instruct with GRPO for Tool-Use and Function Calling},author={Beyoru},year={2025},howpublished={\url{https://huggingface.co/beyoru/Qwen3-4B-I-1209}}}