A function calling model based on Qwen3-1.7B, fine-tuned using RLVR (Reinforcement Learning with Verifiable Rewards) to improve tool-use capabilities on the BFCL V3 benchmark.
🏆 Performance Highlights
Model
Size
BFCL Overall
Category Avg
Qwen3-1.7B-FC (Our)
1.7B
54.2%
50.8%
Qwen3-1.7B (Base)
1.7B
48.8%
45.8%
Qwen3-8B
8B
51.9%
48.6%
Qwen3-14B
14B
51.6%
49.0%
Response Efficiency
Model
Avg Response Tokens
Efficiency vs Base
Base Qwen3-1.7B
35.6 tokens
-
Qwen3-1.7B-FC (Our)
22.7 tokens
-36%
The fine-tuned model generates 36% fewer tokens while maintaining higher accuracy, thanks to:
Direct tool calls without verbose preambles
Concise refusal messages ("None of the provided tools can answer this question")
Efficiency Optimization: Rewarding direct tool calls without preambles
Discourage <think> Tags: Strong penalty (-1.0) for verbose reasoning blocks
🚀 Usage
With Transformers
importtorchfromtransformersimportAutoTokenizer,AutoModelForCausalLMmodel_name="contextboxai/Qwen3-1.7B-FC"tokenizer=AutoTokenizer.from_pretrained(model_name)model=AutoModelForCausalLM.from_pretrained(model_name,torch_dtype=torch.bfloat16,device_map="auto")# Define toolstools=[{"name":"get_weather","description":"Get weather for a location","parameters":{"type":"object","properties":{"location":{"type":"string","description":"City name"}},"required":["location"]}}]messages=[{"role":"user","content":"What's the weather in Tokyo?"}]prompt=tokenizer.apply_chat_template(messages,tools=tools,add_generation_prompt=True,tokenize=False,enable_thinking=False# Disable thinking for efficiency)inputs=tokenizer(prompt,return_tensors="pt")outputs=model.generate(**inputs,max_new_tokens=256)response=tokenizer.decode(outputs[0],skip_special_tokens=True)print(response)
When asked "What is the meaning of life?" with only get_weather tool available:
None of the provided tools can answer this question.
With vLLM (Recommended for Production)
fromvllmimportLLM,SamplingParamsllm=LLM(model="contextboxai/Qwen3-1.7B-FC")sampling_params=SamplingParams(temperature=0,max_tokens=256)# Generate with same prompt format as aboveoutputs=llm.generate([prompt],sampling_params)
💡 Key Features
✅ Strengths
Compact Size: Only 1.7B parameters, runs on consumer GPUs
High Accuracy: Outperforms larger models (8B, 14B) on function calling
Efficient Responses: Direct tool calls without verbose preambles
Strong Refusal: Trained on 46K negative samples to avoid hallucination
Multilingual: Supports English and Vietnamese
Chat Compatible: Maintains general chat ability (100% on chatable benchmark)
⚠️ Limitations
Irrelevance: Slightly more aggressive at calling tools (-5% vs base)
📝 Use Cases
🎯 Ideal For
This model is optimized for edge deployment and customer service automation where a small, efficient model is needed:
Use Case
Description
Edge Device Deployment
Run locally on devices with limited GPU/RAM
Customer Service Chatbot
Automate order lookup, ticket creation, FAQ with tool calls
Voice Agent / Call Center
Real-time voice-to-action for phone support systems
IoT/Smart Home
Control devices via function calling on edge hardware
Mobile AI Assistant
On-device tool execution without cloud dependency
Cost-Efficient API Gateway
Route requests to appropriate backend services
💼 Customer Service Examples
# Example: Customer asks about their ordertools=[{"name":"lookup_order","parameters":{"order_id":"string"}},{"name":"create_ticket","parameters":{"issue":"string","priority":"string"}},{"name":"get_faq","parameters":{"topic":"string"}}]# User: "Đơn hàng #12345 của tôi ở đâu rồi?"# Model output:# <tool_call># {"name": "lookup_order", "arguments": {"order_id": "12345"}}# </tool_call># User: "Tôi muốn đổi trả sản phẩm"# Model output:# <tool_call># {"name": "create_ticket", "arguments": {"issue": "product_return", "priority": "normal"}}# </tool_call>
⚡ Why Small Model?
Benefit
Description
Low Latency
~50ms inference on consumer GPU
Low Cost
8x cheaper than 14B model to deploy
Privacy
Run entirely on-premise, no data leaves device
Offline Capable
Works without internet connection
🧠 Reduced Catastrophic Forgetting
This model uses RLVR (Reinforcement Learning from Verifiable Rewards) instead of traditional SFT, which helps reduce capability loss:
Less forgetting than SFT: RLVR fine-tunes through reward signals rather than directly overwriting weights
100% chatable score: Model maintains normal conversation ability on BFCL benchmark
Multilingual preserved: English and Vietnamese capabilities remain functional
Lower risk: Compared to SFT, RLVR typically causes less regression on non-target tasks
🔬 Technical Details
Attribute
Value
Base Model
Qwen/Qwen3-1.7B
Training Method
RLVR (RL fine-tuning)
Training Steps
100 (V3) + 3000 (V4)
Peak LR
1e-6 → 2e-7
Training Data
117K samples (71K positive + 46K negative)
Precision
bfloat16
Max Sequence Length
32768 tokens
Tool Format
XML-style (<tool_call>...</tool_call>)
📚 Citation
If you use this model, please cite:
@misc{qwen3-fc,title={Qwen3-1.7B-FC: Efficient Function Calling via GRPO Fine-tuning},author={ContextboxAI},year={2024},howpublished={\url{https://huggingface.co/contextboxai/Qwen3-1.7B-FC}},}