Qwen3-1.7B-FC/README.md

---
license: apache-2.0
language:
- en
- vi
base_model: Qwen/Qwen3-1.7B
tags:
- function-calling
- tool-use
- qwen3
- grpo
- rl-fine-tuned
datasets:
- Salesforce/xlam-function-calling-60k
- Team-ACE/ToolACE
- Agent-Ark/Toucan-1.5M
pipeline_tag: text-generation
library_name: transformers
---

# Qwen3-1.7B-FC: Function Calling Specialist

A function calling model based on Qwen3-1.7B, fine-tuned using **RLVR (Reinforcement Learning with Verifiable Rewards)** to improve tool-use capabilities on the BFCL V3 benchmark.

## 🏆 Performance Highlights

| Model | Size | BFCL Overall | Category Avg |
|-------|------|--------------|--------------|
| **Qwen3-1.7B-FC (Our)** | **1.7B** | **54.2%** | **50.8%** |
| Qwen3-1.7B (Base) | 1.7B | 48.8% | 45.8% |
| Qwen3-8B | 8B | 51.9% | 48.6% |
| Qwen3-14B | 14B | 51.6% | 49.0% |


### Response Efficiency

| Model | Avg Response Tokens | Efficiency vs Base |
|-------|--------------------|--------------------|
| Base Qwen3-1.7B | 35.6 tokens | - |
| **Qwen3-1.7B-FC (Our)** | **22.7 tokens** | **-36%** |

The fine-tuned model generates **36% fewer tokens** while maintaining higher accuracy, thanks to:
- Direct tool calls without verbose preambles
- Concise refusal messages ("None of the provided tools can answer this question")
- Reduced `<think>` reasoning blocks

## 📊 Detailed Benchmark Results (BFCL V3)

### Core Function Calling

| Category | Qwen3-1.7B-FC (Our) | Base 1.7B | Qwen3-8B | Qwen3-14B |
|----------|---------------|-----------|----------|----------|
| simple | **81.0%** | 61.5% | 69.2% | 65.5% |
| multiple | **79.0%** | 55.5% | 66.0% | 57.0% |
| parallel | 78.0% | 68.0% | **78.0%** | 77.0% |
| parallel_multiple | 64.5% | 51.5% | **66.5%** | **66.5%** |
| irrelevance | 81.2% | 86.2% | 85.4% | **90.4%** |


### Executable Python

| Category | Qwen3-1.7B-FC (Our) | Base 1.7B | 8B | 14B |
|----------|---------------|-----------|-----|-----|
| exec_simple | 84.0% | 82.0% | 84.0% | **87.0%** |
| exec_multiple | 70.0% | 70.0% | **78.0%** | **78.0%** |
| exec_parallel | 80.0% | 76.0% | **86.0%** | **90.0%** |
| exec_parallel_multiple | 60.0% | 60.0% | **67.5%** | 65.0% |

### Live API Categories

| Category | Qwen3-1.7B-FC (Our) | Base 1.7B | Qwen3-8B | Qwen3-14B |
|----------|---------------|-----------|----------|----------|
| live_simple | **63.6%** | 43.8% | 51.2% | 51.6% |
| live_multiple | **55.0%** | 36.8% | 43.7% | 42.5% |
| live_parallel | **50.0%** | 18.8% | 43.8% | 43.8% |
| live_parallel_multiple | **66.7%** | 37.5% | 54.2% | 50.0% |
| live_irrelevance | 66.1% | **80.3%** | 78.7% | **79.9%** |


## 📚 Training Data

### Data Sources

| Source | Samples | Type | Description |
|--------|---------|------|-------------|
| [**xLAM**](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) | ~60,000 | Positive | High-quality function calling examples from Salesforce |
| [**ToolACE**](https://huggingface.co/datasets/Team-ACE/ToolACE) | ~11,000 | Positive | Diverse multi-turn tool usage scenarios |
| [**Toucan-1.5M**](https://huggingface.co/datasets/Agent-Ark/Toucan-1.5M) | 40,000 | **Negative** | Irrelevant queries (Server Shuffle method) |
| **Synthetic Negatives** | 6,000 | **Negative** | Domain mismatch, partial fulfillment, permission errors |


### Negative Sample Types

The model is trained to **refuse appropriately** using diverse negative samples:

| Type | Description | Example |
|------|-------------|---------|
| **Toucan Irrelevant** | Query has no matching tool in available functions | "What's the weather?" when only `get_stock_price` is available |
| **Domain Mismatch** | Tools from wrong domain | Asking about finance when only cooking tools available |
| **Action Mismatch** | Similar name but wrong action | Asking to "delete" when only "get" function exists |
| **Partial Fulfillment** | Tools can't fully solve query | Need 2 steps but only 1 tool available |
| **Permission/Auth** | Missing required permissions | Admin action without credentials |
| **Format Mismatch** | Wrong data format requirements | Tool expects JSON but query provides CSV |

## 🔧 Training Methodology

### Two-Stage RLVR Fine-tuning


1. **Stage 1**: Accuracy-focused training (V3)
   - Trained from Qwen3-1.7B base
   - Dataset: ~40K samples (stage2.parquet)
   - Reward: Correctness (1.0) + Format (0.1) + Efficiency (0.3) + Refusal (0.3)
   - Config: max_steps=5000, LR=5e-7, temp=1.2
   - **Best checkpoint: step 100** (early stopping, highest accuracy)

2. **Stage 2**: Efficiency optimization (V4)
   - Loaded from Stage 1 checkpoint-100
   - Focus: Reduce verbosity, discourage `<think>` tags
   - Reward weights: Efficiency=1.0, Correctness=0.5, Format=0.1, Refusal=0.3
   - Config: max_steps=3000, LR=2e-7
   - **Selected checkpoint: step 1100**
   - **Result**: 36% reduction in response tokens

### Reward Function Design

```python
# Combined Reward Formula
total_reward = (
    format_weight * format_reward +           # Valid <tool_call> JSON (0.0-1.0)
    correct_weight * correctness_reward +     # Tool name + arguments match (0.0-1.0)
    refusal_weight * refusal_reward +         # +1.0 correct refusal, -1.0 hallucination
    efficiency_weight * efficiency_reward     # Penalty for verbose <think>
)

# Stage 1 Weights (Accuracy Focus)
STAGE1_WEIGHTS = {
    'format': 0.2,
    'correctness': 1.0,    # Main focus
    'efficiency': 0.2,
    'refusal': 0.3,
}

# Stage 2 Weights (Efficiency Focus)
STAGE2_WEIGHTS = {
    'format': 0.1,
    'correctness': 0.5,    # Reduced - already accurate from Stage 1
    'efficiency': 1.0,     # Main focus - penalize <think> tags
    'refusal': 0.3,
}
```

### Individual Reward Components

| Component | Description | Range |
|-----------|-------------|-------|
| **format_reward** | Valid `<tool_call>JSON</tool_call>` structure | 0.0 - 1.0 |
| **correctness_reward** | Tool name match + argument similarity | 0.0 - 1.0 |
| **refusal_reward** | +1.0 correct refusal, **-1.0 hallucination** | -1.0 to +1.0 |
| **efficiency_reward** | Stage 1: -0.3 for `<think>`, Stage 2: **-1.0** | -1.0 to +0.1 |

### Key Training Innovations

1. **Strong Refusal Penalty**: -1.0 for calling tools when `ground_truth = []`
2. **Toucan Irrelevant Data**: 40K high-quality "unanswerable" samples
3. **Efficiency Optimization**: Rewarding direct tool calls without preambles
4. **Discourage `<think>` Tags**: Strong penalty (-1.0) for verbose reasoning blocks

## 🚀 Usage

### With Transformers

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "contextboxai/Qwen3-1.7B-FC"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")

# Define tools
tools = [{
    "name": "get_weather",
    "description": "Get weather for a location",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City name"}
        },
        "required": ["location"]
    }
}]

messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]

prompt = tokenizer.apply_chat_template(
    messages,
    tools=tools,
    add_generation_prompt=True,
    tokenize=False,
    enable_thinking=False  # Disable thinking for efficiency
)

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

### Expected Output

```xml
<tool_call>
{"name": "get_weather", "arguments": {"location": "Tokyo"}}
</tool_call>
```

### Refusal Example

When asked "What is the meaning of life?" with only `get_weather` tool available:

```
None of the provided tools can answer this question.
```

### With vLLM (Recommended for Production)

```python
from vllm import LLM, SamplingParams

llm = LLM(model="contextboxai/Qwen3-1.7B-FC")
sampling_params = SamplingParams(temperature=0, max_tokens=256)

# Generate with same prompt format as above
outputs = llm.generate([prompt], sampling_params)
```

## 💡 Key Features

### ✅ Strengths

- **Compact Size**: Only 1.7B parameters, runs on consumer GPUs
- **High Accuracy**: Outperforms larger models (8B, 14B) on function calling
- **Efficient Responses**: Direct tool calls without verbose preambles
- **Strong Refusal**: Trained on 46K negative samples to avoid hallucination
- **Multilingual**: Supports English and Vietnamese
- **Chat Compatible**: Maintains general chat ability (100% on chatable benchmark)

### ⚠️ Limitations

- **Irrelevance**: Slightly more aggressive at calling tools (-5% vs base)


## 📝 Use Cases

### 🎯 Ideal For

This model is optimized for **edge deployment** and **customer service automation** where a small, efficient model is needed:

| Use Case | Description |
|----------|-------------|
| **Edge Device Deployment** | Run locally on devices with limited GPU/RAM |
| **Customer Service Chatbot** | Automate order lookup, ticket creation, FAQ with tool calls |
| **Voice Agent / Call Center** | Real-time voice-to-action for phone support systems |
| **IoT/Smart Home** | Control devices via function calling on edge hardware |
| **Mobile AI Assistant** | On-device tool execution without cloud dependency |
| **Cost-Efficient API Gateway** | Route requests to appropriate backend services |

### 💼 Customer Service Examples

```python
# Example: Customer asks about their order
tools = [
    {"name": "lookup_order", "parameters": {"order_id": "string"}},
    {"name": "create_ticket", "parameters": {"issue": "string", "priority": "string"}},
    {"name": "get_faq", "parameters": {"topic": "string"}}
]

# User: "Đơn hàng #12345 của tôi ở đâu rồi?"
# Model output:
# <tool_call>
# {"name": "lookup_order", "arguments": {"order_id": "12345"}}
# </tool_call>

# User: "Tôi muốn đổi trả sản phẩm"
# Model output:
# <tool_call>
# {"name": "create_ticket", "arguments": {"issue": "product_return", "priority": "normal"}}
# </tool_call>
```

### ⚡ Why Small Model?

| Benefit | Description |
|---------|-------------|
| **Low Latency** | ~50ms inference on consumer GPU |
| **Low Cost** | 8x cheaper than 14B model to deploy |
| **Privacy** | Run entirely on-premise, no data leaves device |
| **Offline Capable** | Works without internet connection |

### 🧠 Reduced Catastrophic Forgetting

This model uses **RLVR (Reinforcement Learning from Verifiable Rewards)** instead of traditional SFT, which helps reduce capability loss:

- **Less forgetting than SFT**: RLVR fine-tunes through reward signals rather than directly overwriting weights
- **100% chatable score**: Model maintains normal conversation ability on BFCL benchmark
- **Multilingual preserved**: English and Vietnamese capabilities remain functional
- **Lower risk**: Compared to SFT, RLVR typically causes less regression on non-target tasks

## 🔬 Technical Details

| Attribute | Value |
|-----------|-------|
| Base Model | Qwen/Qwen3-1.7B |
| Training Method | RLVR (RL fine-tuning) |
| Training Steps | 100 (V3) + 3000 (V4) |
| Peak LR | 1e-6 → 2e-7 |
| Training Data | 117K samples (71K positive + 46K negative) |
| Precision | bfloat16 |
| Max Sequence Length | 32768 tokens |
| Tool Format | XML-style (`<tool_call>...</tool_call>`) |

## 📚 Citation

If you use this model, please cite:

```bibtex
@misc{qwen3-fc,
  title={Qwen3-1.7B-FC: Efficient Function Calling via GRPO Fine-tuning},
  author={ContextboxAI},
  year={2024},
  howpublished={\url{https://huggingface.co/contextboxai/Qwen3-1.7B-FC}},
}
```

## 🙏 Acknowledgments

- [Qwen Team](https://github.com/QwenLM/Qwen3) for the excellent base model
- [Jan-nano](https://arxiv.org/pdf/2506.22760) for training methodology inspiration
- [Berkeley Function Calling Leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html) for the benchmark
- [xLAM (Salesforce)](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) for function calling data
- [ToolACE](https://huggingface.co/datasets/Team-ACE/ToolACE) for multi-turn tool usage data
- [Toucan-1.5M (Agent-Ark)](https://huggingface.co/datasets/Agent-Ark/Toucan-1.5M) for irrelevant/negative samples
- [TRL](https://github.com/huggingface/trl) for GRPO implementation

## 📄 License

Apache 2.0

---

**Model Card Contact**: ContextboxAI
初始化项目，由ModelHub XC社区提供模型 Model: contextboxai/Qwen3-1.7B-FC Source: Original Platform 2026-05-02 05:32:06 +08:00			`---`
			`license: apache-2.0`
			`language:`
			`- en`
			`- vi`
			`base_model: Qwen/Qwen3-1.7B`
			`tags:`
			`- function-calling`
			`- tool-use`
			`- qwen3`
			`- grpo`
			`- rl-fine-tuned`
			`datasets:`
			`- Salesforce/xlam-function-calling-60k`
			`- Team-ACE/ToolACE`
			`- Agent-Ark/Toucan-1.5M`
			`pipeline_tag: text-generation`
			`library_name: transformers`
			`---`

			`# Qwen3-1.7B-FC: Function Calling Specialist`

			`A function calling model based on Qwen3-1.7B, fine-tuned using RLVR (Reinforcement Learning with Verifiable Rewards) to improve tool-use capabilities on the BFCL V3 benchmark.`

			`## 🏆 Performance Highlights`

			`\| Model \| Size \| BFCL Overall \| Category Avg \|`
			`\|-------\|------\|--------------\|--------------\|`
			`\| Qwen3-1.7B-FC (Our) \| 1.7B \| 54.2% \| 50.8% \|`
			`\| Qwen3-1.7B (Base) \| 1.7B \| 48.8% \| 45.8% \|`
			`\| Qwen3-8B \| 8B \| 51.9% \| 48.6% \|`
			`\| Qwen3-14B \| 14B \| 51.6% \| 49.0% \|`


			`### Response Efficiency`

			`\| Model \| Avg Response Tokens \| Efficiency vs Base \|`
			`\|-------\|--------------------\|--------------------\|`
			`\| Base Qwen3-1.7B \| 35.6 tokens \| - \|`
			`\| Qwen3-1.7B-FC (Our) \| 22.7 tokens \| -36% \|`

			`The fine-tuned model generates 36% fewer tokens while maintaining higher accuracy, thanks to:`
			`- Direct tool calls without verbose preambles`
			`- Concise refusal messages ("None of the provided tools can answer this question")`
			- Reduced `<think>` reasoning blocks

			`## 📊 Detailed Benchmark Results (BFCL V3)`

			`### Core Function Calling`

			`\| Category \| Qwen3-1.7B-FC (Our) \| Base 1.7B \| Qwen3-8B \| Qwen3-14B \|`
			`\|----------\|---------------\|-----------\|----------\|----------\|`
			`\| simple \| 81.0% \| 61.5% \| 69.2% \| 65.5% \|`
			`\| multiple \| 79.0% \| 55.5% \| 66.0% \| 57.0% \|`
			`\| parallel \| 78.0% \| 68.0% \| 78.0% \| 77.0% \|`
			`\| parallel_multiple \| 64.5% \| 51.5% \| 66.5% \| 66.5% \|`
			`\| irrelevance \| 81.2% \| 86.2% \| 85.4% \| 90.4% \|`



			`### Executable Python`

			`\| Category \| Qwen3-1.7B-FC (Our) \| Base 1.7B \| 8B \| 14B \|`
			`\|----------\|---------------\|-----------\|-----\|-----\|`
			`\| exec_simple \| 84.0% \| 82.0% \| 84.0% \| 87.0% \|`
			`\| exec_multiple \| 70.0% \| 70.0% \| 78.0% \| 78.0% \|`
			`\| exec_parallel \| 80.0% \| 76.0% \| 86.0% \| 90.0% \|`
			`\| exec_parallel_multiple \| 60.0% \| 60.0% \| 67.5% \| 65.0% \|`

			`### Live API Categories`

			`\| Category \| Qwen3-1.7B-FC (Our) \| Base 1.7B \| Qwen3-8B \| Qwen3-14B \|`
			`\|----------\|---------------\|-----------\|----------\|----------\|`
			`\| live_simple \| 63.6% \| 43.8% \| 51.2% \| 51.6% \|`
			`\| live_multiple \| 55.0% \| 36.8% \| 43.7% \| 42.5% \|`
			`\| live_parallel \| 50.0% \| 18.8% \| 43.8% \| 43.8% \|`
			`\| live_parallel_multiple \| 66.7% \| 37.5% \| 54.2% \| 50.0% \|`
			`\| live_irrelevance \| 66.1% \| 80.3% \| 78.7% \| 79.9% \|`



			`## 📚 Training Data`

			`### Data Sources`

			`\| Source \| Samples \| Type \| Description \|`
			`\|--------\|---------\|------\|-------------\|`
			`\| [xLAM](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) \| ~60,000 \| Positive \| High-quality function calling examples from Salesforce \|`
			`\| [ToolACE](https://huggingface.co/datasets/Team-ACE/ToolACE) \| ~11,000 \| Positive \| Diverse multi-turn tool usage scenarios \|`
			`\| [Toucan-1.5M](https://huggingface.co/datasets/Agent-Ark/Toucan-1.5M) \| 40,000 \| Negative \| Irrelevant queries (Server Shuffle method) \|`
			`\| Synthetic Negatives \| 6,000 \| Negative \| Domain mismatch, partial fulfillment, permission errors \|`


			`### Negative Sample Types`

			`The model is trained to refuse appropriately using diverse negative samples:`

			`\| Type \| Description \| Example \|`
			`\|------\|-------------\|---------\|`
			\| Toucan Irrelevant \| Query has no matching tool in available functions \| "What's the weather?" when only `get_stock_price` is available \|
			`\| Domain Mismatch \| Tools from wrong domain \| Asking about finance when only cooking tools available \|`
			`\| Action Mismatch \| Similar name but wrong action \| Asking to "delete" when only "get" function exists \|`
			`\| Partial Fulfillment \| Tools can't fully solve query \| Need 2 steps but only 1 tool available \|`
			`\| Permission/Auth \| Missing required permissions \| Admin action without credentials \|`
			`\| Format Mismatch \| Wrong data format requirements \| Tool expects JSON but query provides CSV \|`

			`## 🔧 Training Methodology`

			`### Two-Stage RLVR Fine-tuning`


			`1. Stage 1: Accuracy-focused training (V3)`
			`- Trained from Qwen3-1.7B base`
			`- Dataset: ~40K samples (stage2.parquet)`
			`- Reward: Correctness (1.0) + Format (0.1) + Efficiency (0.3) + Refusal (0.3)`
			`- Config: max_steps=5000, LR=5e-7, temp=1.2`
			`- Best checkpoint: step 100 (early stopping, highest accuracy)`

			`2. Stage 2: Efficiency optimization (V4)`
			`- Loaded from Stage 1 checkpoint-100`
			- Focus: Reduce verbosity, discourage `<think>` tags
			`- Reward weights: Efficiency=1.0, Correctness=0.5, Format=0.1, Refusal=0.3`
			`- Config: max_steps=3000, LR=2e-7`
			`- Selected checkpoint: step 1100`
			`- Result: 36% reduction in response tokens`

			`### Reward Function Design`

			```python
			`# Combined Reward Formula`
			`total_reward = (`
			`format_weight * format_reward + # Valid <tool_call> JSON (0.0-1.0)`
			`correct_weight * correctness_reward + # Tool name + arguments match (0.0-1.0)`
			`refusal_weight * refusal_reward + # +1.0 correct refusal, -1.0 hallucination`
			`efficiency_weight * efficiency_reward # Penalty for verbose <think>`
			`)`

			`# Stage 1 Weights (Accuracy Focus)`
			`STAGE1_WEIGHTS = {`
			`'format': 0.2,`
			`'correctness': 1.0, # Main focus`
			`'efficiency': 0.2,`
			`'refusal': 0.3,`
			`}`

			`# Stage 2 Weights (Efficiency Focus)`
			`STAGE2_WEIGHTS = {`
			`'format': 0.1,`
			`'correctness': 0.5, # Reduced - already accurate from Stage 1`
			`'efficiency': 1.0, # Main focus - penalize <think> tags`
			`'refusal': 0.3,`
			`}`
			```

			`### Individual Reward Components`

			`\| Component \| Description \| Range \|`
			`\|-----------\|-------------\|-------\|`
			\| format_reward \| Valid `<tool_call>JSON</tool_call>` structure \| 0.0 - 1.0 \|
			`\| correctness_reward \| Tool name match + argument similarity \| 0.0 - 1.0 \|`
			`\| refusal_reward \| +1.0 correct refusal, -1.0 hallucination \| -1.0 to +1.0 \|`
			\| efficiency_reward \| Stage 1: -0.3 for `<think>`, Stage 2: -1.0 \| -1.0 to +0.1 \|

			`### Key Training Innovations`

			1. Strong Refusal Penalty: -1.0 for calling tools when `ground_truth = []`
			`2. Toucan Irrelevant Data: 40K high-quality "unanswerable" samples`
			`3. Efficiency Optimization: Rewarding direct tool calls without preambles`
			4. Discourage `<think>` Tags: Strong penalty (-1.0) for verbose reasoning blocks

			`## 🚀 Usage`

			`### With Transformers`

			```python
			`import torch`
			`from transformers import AutoTokenizer, AutoModelForCausalLM`

			`model_name = "contextboxai/Qwen3-1.7B-FC"`
			`tokenizer = AutoTokenizer.from_pretrained(model_name)`
			`model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")`

			`# Define tools`
			`tools = [{`
			`"name": "get_weather",`
			`"description": "Get weather for a location",`
			`"parameters": {`
			`"type": "object",`
			`"properties": {`
			`"location": {"type": "string", "description": "City name"}`
			`},`
			`"required": ["location"]`
			`}`
			`}]`

			`messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]`

			`prompt = tokenizer.apply_chat_template(`
			`messages,`
			`tools=tools,`
			`add_generation_prompt=True,`
			`tokenize=False,`
			`enable_thinking=False # Disable thinking for efficiency`
			`)`

			`inputs = tokenizer(prompt, return_tensors="pt")`
			`outputs = model.generate(**inputs, max_new_tokens=256)`
			`response = tokenizer.decode(outputs[0], skip_special_tokens=True)`
			`print(response)`
			```

			`### Expected Output`

			```xml
			`<tool_call>`
			`{"name": "get_weather", "arguments": {"location": "Tokyo"}}`
			`</tool_call>`
			```

			`### Refusal Example`

			When asked "What is the meaning of life?" with only `get_weather` tool available:

			```
			`None of the provided tools can answer this question.`
			```

			`### With vLLM (Recommended for Production)`

			```python
			`from vllm import LLM, SamplingParams`

			`llm = LLM(model="contextboxai/Qwen3-1.7B-FC")`
			`sampling_params = SamplingParams(temperature=0, max_tokens=256)`

			`# Generate with same prompt format as above`
			`outputs = llm.generate([prompt], sampling_params)`
			```

			`## 💡 Key Features`

			`### ✅ Strengths`

			`- Compact Size: Only 1.7B parameters, runs on consumer GPUs`
			`- High Accuracy: Outperforms larger models (8B, 14B) on function calling`
			`- Efficient Responses: Direct tool calls without verbose preambles`
			`- Strong Refusal: Trained on 46K negative samples to avoid hallucination`
			`- Multilingual: Supports English and Vietnamese`
			`- Chat Compatible: Maintains general chat ability (100% on chatable benchmark)`

			`### ⚠️ Limitations`

			`- Irrelevance: Slightly more aggressive at calling tools (-5% vs base)`


			`## 📝 Use Cases`

			`### 🎯 Ideal For`

			`This model is optimized for edge deployment and customer service automation where a small, efficient model is needed:`

			`\| Use Case \| Description \|`
			`\|----------\|-------------\|`
			`\| Edge Device Deployment \| Run locally on devices with limited GPU/RAM \|`
			`\| Customer Service Chatbot \| Automate order lookup, ticket creation, FAQ with tool calls \|`
			`\| Voice Agent / Call Center \| Real-time voice-to-action for phone support systems \|`
			`\| IoT/Smart Home \| Control devices via function calling on edge hardware \|`
			`\| Mobile AI Assistant \| On-device tool execution without cloud dependency \|`
			`\| Cost-Efficient API Gateway \| Route requests to appropriate backend services \|`

			`### 💼 Customer Service Examples`

			```python
			`# Example: Customer asks about their order`
			`tools = [`
			`{"name": "lookup_order", "parameters": {"order_id": "string"}},`
			`{"name": "create_ticket", "parameters": {"issue": "string", "priority": "string"}},`
			`{"name": "get_faq", "parameters": {"topic": "string"}}`
			`]`

			`# User: "Đơn hàng #12345 của tôi ở đâu rồi?"`
			`# Model output:`
			`# <tool_call>`
			`# {"name": "lookup_order", "arguments": {"order_id": "12345"}}`
			`# </tool_call>`

			`# User: "Tôi muốn đổi trả sản phẩm"`
			`# Model output:`
			`# <tool_call>`
			`# {"name": "create_ticket", "arguments": {"issue": "product_return", "priority": "normal"}}`
			`# </tool_call>`
			```

			`### ⚡ Why Small Model?`

			`\| Benefit \| Description \|`
			`\|---------\|-------------\|`
			`\| Low Latency \| ~50ms inference on consumer GPU \|`
			`\| Low Cost \| 8x cheaper than 14B model to deploy \|`
			`\| Privacy \| Run entirely on-premise, no data leaves device \|`
			`\| Offline Capable \| Works without internet connection \|`

			`### 🧠 Reduced Catastrophic Forgetting`

			`This model uses RLVR (Reinforcement Learning from Verifiable Rewards) instead of traditional SFT, which helps reduce capability loss:`

			`- Less forgetting than SFT: RLVR fine-tunes through reward signals rather than directly overwriting weights`
			`- 100% chatable score: Model maintains normal conversation ability on BFCL benchmark`
			`- Multilingual preserved: English and Vietnamese capabilities remain functional`
			`- Lower risk: Compared to SFT, RLVR typically causes less regression on non-target tasks`

			`## 🔬 Technical Details`

			`\| Attribute \| Value \|`
			`\|-----------\|-------\|`
			`\| Base Model \| Qwen/Qwen3-1.7B \|`
			`\| Training Method \| RLVR (RL fine-tuning) \|`
			`\| Training Steps \| 100 (V3) + 3000 (V4) \|`
			`\| Peak LR \| 1e-6 → 2e-7 \|`
			`\| Training Data \| 117K samples (71K positive + 46K negative) \|`
			`\| Precision \| bfloat16 \|`
			`\| Max Sequence Length \| 32768 tokens \|`
			\| Tool Format \| XML-style (`<tool_call>...</tool_call>`) \|

			`## 📚 Citation`

			`If you use this model, please cite:`

			```bibtex
			`@misc{qwen3-fc,`
			`title={Qwen3-1.7B-FC: Efficient Function Calling via GRPO Fine-tuning},`
			`author={ContextboxAI},`
			`year={2024},`
			`howpublished={\url{https://huggingface.co/contextboxai/Qwen3-1.7B-FC}},`
			`}`
			```

			`## 🙏 Acknowledgments`

			`- [Qwen Team](https://github.com/QwenLM/Qwen3) for the excellent base model`
			`- [Jan-nano](https://arxiv.org/pdf/2506.22760) for training methodology inspiration`
			`- [Berkeley Function Calling Leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html) for the benchmark`
			`- [xLAM (Salesforce)](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) for function calling data`
			`- [ToolACE](https://huggingface.co/datasets/Team-ACE/ToolACE) for multi-turn tool usage data`
			`- [Toucan-1.5M (Agent-Ark)](https://huggingface.co/datasets/Agent-Ark/Toucan-1.5M) for irrelevant/negative samples`
			`- [TRL](https://github.com/huggingface/trl) for GRPO implementation`

			`## 📄 License`

			`Apache 2.0`

			`---`

			`Model Card Contact: ContextboxAI`