245 lines
11 KiB
Markdown
245 lines
11 KiB
Markdown
|
|
---
|
||
|
|
license: apache-2.0
|
||
|
|
datasets:
|
||
|
|
- glaiveai/glaive-function-calling-v2
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
base_model:
|
||
|
|
- Qwen/Qwen2.5-Coder-32B-Instruct
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
library_name: transformers
|
||
|
|
tags:
|
||
|
|
- tools
|
||
|
|
- functions
|
||
|
|
---
|
||
|
|
# Qwen2.5-Coder-32B-Glaive-ToolCall
|
||
|
|

|
||
|
|
## Model Description
|
||
|
|
|
||
|
|
This model is a fine-tuned version of [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) specifically enhanced for tool calling capabilities. The model has been trained using the [Glaive Function Calling v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) dataset (`glaiveai/glaive-function-calling-v2`) to significantly improve its ability to understand, generate, and execute function calls in various programming and automation contexts.
|
||
|
|
|
||
|
|
## Model Details
|
||
|
|
|
||
|
|
- **Base Model**: Qwen/Qwen2.5-Coder-32B-Instruct
|
||
|
|
- **Model Type**: Large Language Model (LLM) with enhanced tool calling capabilities
|
||
|
|
- **Architecture**: Transformer-based decoder model
|
||
|
|
- **Parameters**: 32 billion parameters
|
||
|
|
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
|
||
|
|
- **Training Dataset**: glaive-function-calling-v2
|
||
|
|
- **Language Support**: Multilingual
|
||
|
|
|
||
|
|
## Training Configuration
|
||
|
|
|
||
|
|
- **Fine-tuning Type**: LoRA with rank 8, alpha 16
|
||
|
|
- **Training Epochs**: 3.0
|
||
|
|
- **Learning Rate**: 5e-5 with cosine scheduler
|
||
|
|
- **Batch Size**: 2 per device with 8 gradient accumulation steps
|
||
|
|
- **Context Length**: 2048 tokens
|
||
|
|
- **Optimizer**: AdamW
|
||
|
|
- **Precision**: BF16
|
||
|
|
- **Max Samples**: 100,000
|
||
|
|
|
||
|
|
## Enhanced Capabilities
|
||
|
|
|
||
|
|
### Tool Calling Improvements
|
||
|
|
|
||
|
|
This model demonstrates significant improvements in:
|
||
|
|
|
||
|
|
1. **Function Schema Understanding**: Enhanced ability to parse and understand complex function signatures and parameter requirements
|
||
|
|
2. **Context-Aware Tool Selection**: Improved decision-making for selecting appropriate tools based on user queries
|
||
|
|
3. **Parameter Extraction**: Better extraction and formatting of function parameters from natural language inputs
|
||
|
|
4. **Multi-step Tool Orchestration**: Enhanced capability to chain multiple tool calls for complex tasks
|
||
|
|
5. **Error Handling**: Improved error detection and recovery in tool calling scenarios
|
||
|
|
|
||
|
|
### Key Features
|
||
|
|
|
||
|
|
- **Robust JSON Generation**: Produces well-formatted JSON for function calls with proper schema adherence
|
||
|
|
- **Natural Language Integration**: Seamlessly integrates tool calls within conversational responses
|
||
|
|
- **Code Generation with Tools**: Enhanced ability to generate code that incorporates external tool usage
|
||
|
|
- **API Integration**: Improved understanding of REST APIs, GraphQL, and other web service interfaces
|
||
|
|
|
||
|
|
## Use Cases
|
||
|
|
|
||
|
|
This model is particularly well-suited for:
|
||
|
|
|
||
|
|
- **AI Assistants**: Building conversational AI that can interact with external systems
|
||
|
|
- **Automation Workflows**: Creating intelligent automation scripts with dynamic tool usage
|
||
|
|
- **Code Generation**: Generating code that integrates with APIs and external services
|
||
|
|
- **Data Processing**: Automating data analysis and processing tasks with appropriate tools
|
||
|
|
- **System Integration**: Building bridges between different software systems and services
|
||
|
|
|
||
|
|
## Usage Example
|
||
|
|
|
||
|
|
```python
|
||
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||
|
|
import torch
|
||
|
|
|
||
|
|
# Load the model and tokenizer
|
||
|
|
model_name = "RekklesAI/Qwen2.5-Coder-32B-Glaive-ToolCall"
|
||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
|
||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
||
|
|
model_name,
|
||
|
|
torch_dtype=torch.bfloat16,
|
||
|
|
device_map="auto",
|
||
|
|
trust_remote_code=True
|
||
|
|
)
|
||
|
|
|
||
|
|
# Example prompt for tool calling
|
||
|
|
prompt = """You have access to a weather API. Help me get the current weather for New York City.
|
||
|
|
|
||
|
|
Available tools:
|
||
|
|
- get_weather(location: str, units: str = "metric") -> dict
|
||
|
|
|
||
|
|
User: What's the weather like in New York City?"""
|
||
|
|
|
||
|
|
# Generate response
|
||
|
|
inputs = tokenizer(prompt, return_tensors="pt")
|
||
|
|
with torch.no_grad():
|
||
|
|
outputs = model.generate(
|
||
|
|
inputs.input_ids,
|
||
|
|
max_new_tokens=512,
|
||
|
|
temperature=0.7,
|
||
|
|
do_sample=True,
|
||
|
|
pad_token_id=tokenizer.eos_token_id
|
||
|
|
)
|
||
|
|
|
||
|
|
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
|
||
|
|
print(response)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Performance Metrics
|
||
|
|
|
||
|
|
The model shows significant improvements in tool calling benchmarks:
|
||
|
|
|
||
|
|
- **Function Call Accuracy**: Enhanced precision in generating syntactically correct function calls
|
||
|
|
- **Parameter Extraction**: Improved accuracy in extracting relevant parameters from user queries
|
||
|
|
- **Tool Selection**: Better performance in selecting appropriate tools for given tasks
|
||
|
|
- **JSON Formatting**: Reduced errors in JSON structure and formatting
|
||
|
|
|
||
|
|
### Training Loss
|
||
|
|
|
||
|
|
The following chart shows the training loss progression during the fine-tuning process:
|
||
|
|
|
||
|
|
|
||
|
|

|
||
|
|
|
||
|
|
*Training loss curve demonstrating stable convergence over 3 epochs with the Glaive Function Calling v2 dataset.*
|
||
|
|
|
||
|
|
## Limitations
|
||
|
|
|
||
|
|
- The model's tool calling capabilities are primarily trained on the patterns present in the Glaive Function Calling v2 dataset
|
||
|
|
- Performance may vary for highly specialized or domain-specific tools not represented in the training data
|
||
|
|
- Like all LLMs, the model may occasionally generate plausible-sounding but incorrect tool calls
|
||
|
|
- The model requires careful prompt engineering for optimal tool calling performance
|
||
|
|
|
||
|
|
## Ethical Considerations
|
||
|
|
|
||
|
|
- **Tool Safety**: Users should implement proper validation and sandboxing when allowing the model to execute actual tool calls
|
||
|
|
- **Access Control**: Implement appropriate access controls and permissions for tools accessible to the model
|
||
|
|
- **Data Privacy**: Be mindful of sensitive data that might be passed through tool calls
|
||
|
|
- **Monitoring**: Implement logging and monitoring for tool usage in production environments
|
||
|
|
|
||
|
|
## Training Data
|
||
|
|
|
||
|
|
The model was fine-tuned using the **Glaive Function Calling v2** dataset (`glaiveai/glaive-function-calling-v2`), a comprehensive and high-quality dataset specifically designed for training language models in function calling capabilities.
|
||
|
|
|
||
|
|
### Dataset Overview
|
||
|
|
|
||
|
|
- **Dataset Size**: 113,000 training examples
|
||
|
|
- **Format**: JSON with structured conversations
|
||
|
|
- **Language**: English
|
||
|
|
- **License**: Apache 2.0
|
||
|
|
- **Source**: [Glaive AI](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2)
|
||
|
|
|
||
|
|
### Dataset Characteristics
|
||
|
|
|
||
|
|
The Glaive Function Calling v2 dataset is meticulously curated to provide diverse and realistic function calling scenarios:
|
||
|
|
|
||
|
|
#### **Conversation Structure**
|
||
|
|
- **System Messages**: Define the assistant's role and available functions with detailed schemas
|
||
|
|
- **Multi-turn Dialogues**: Natural conversations between users and AI assistants
|
||
|
|
- **Function Calls**: Properly formatted JSON function invocations
|
||
|
|
- **Function Responses**: Realistic API responses and result handling
|
||
|
|
- **Error Scenarios**: Examples of graceful error handling and capability limitations
|
||
|
|
|
||
|
|
#### **Function Diversity**
|
||
|
|
The dataset covers a wide range of function types and use cases:
|
||
|
|
|
||
|
|
- **Utility Functions**: Email sending, calendar management, password generation
|
||
|
|
- **Data Retrieval**: News headlines, stock prices, weather information
|
||
|
|
- **Computational Tasks**: Mathematical calculations, unit conversions, data analysis
|
||
|
|
- **Search Operations**: Movie searches, book lookups, general information retrieval
|
||
|
|
- **Communication Tools**: Contact management, messaging systems
|
||
|
|
- **Financial Services**: Exchange rates, loan calculations, investment data
|
||
|
|
- **Content Creation**: Text generation, formatting, summarization
|
||
|
|
|
||
|
|
#### **Quality Features**
|
||
|
|
|
||
|
|
1. **Realistic Scenarios**: Conversations mirror real-world user interactions with AI assistants
|
||
|
|
2. **Proper Error Handling**: Examples of polite refusals when functions are unavailable
|
||
|
|
3. **Parameter Validation**: Correct handling of required and optional function parameters
|
||
|
|
4. **Context Awareness**: Functions are called appropriately based on conversation context
|
||
|
|
5. **Natural Language Integration**: Seamless integration of function results into conversational responses
|
||
|
|
|
||
|
|
#### **Training Examples Include**:
|
||
|
|
|
||
|
|
- **Single Function Calls**: Simple, direct function invocations
|
||
|
|
- **Multi-step Workflows**: Complex scenarios requiring multiple function calls
|
||
|
|
- **Parameter Extraction**: Converting natural language requests into structured function parameters
|
||
|
|
- **Response Formatting**: Presenting function results in user-friendly formats
|
||
|
|
- **Capability Boundaries**: Clear communication of system limitations
|
||
|
|
|
||
|
|
### Dataset Impact on Model Performance
|
||
|
|
|
||
|
|
This carefully curated dataset enables the model to:
|
||
|
|
|
||
|
|
- **Understand Function Schemas**: Parse and comprehend complex function definitions
|
||
|
|
- **Extract Parameters**: Accurately identify and format required function arguments from user queries
|
||
|
|
- **Generate Valid JSON**: Produce syntactically correct function calls
|
||
|
|
- **Handle Edge Cases**: Manage scenarios where requested functions are unavailable
|
||
|
|
- **Maintain Conversational Flow**: Integrate function calling seamlessly into natural dialogue
|
||
|
|
- **Provide Helpful Responses**: Transform function results into meaningful user communications
|
||
|
|
|
||
|
|
### Technical Implementation
|
||
|
|
|
||
|
|
The dataset follows industry-standard formats for function calling:
|
||
|
|
- OpenAI-compatible function schemas
|
||
|
|
- Structured JSON for function definitions and calls
|
||
|
|
- Clear separation between system instructions, user queries, and function responses
|
||
|
|
- Consistent formatting across all examples
|
||
|
|
|
||
|
|
This comprehensive training data ensures the model can handle real-world function calling scenarios with high accuracy and reliability, making it suitable for production deployment in AI assistant applications, automation workflows, and API integration tasks.
|
||
|
|
|
||
|
|
## Technical Specifications
|
||
|
|
|
||
|
|
- **Framework**: Built using LLaMA-Factory
|
||
|
|
- **Hardware Requirements**: Recommended 80GB+ VRAM for inference
|
||
|
|
- **Quantization**: Compatible with various quantization methods (GPTQ, AWQ, etc.)
|
||
|
|
- **Deployment**: Suitable for both cloud and on-premise deployment
|
||
|
|
|
||
|
|
## Citation
|
||
|
|
|
||
|
|
If you use this model in your research or applications, please cite:
|
||
|
|
|
||
|
|
```bibtex
|
||
|
|
@misc{qwen25-coder-glaive-toolcall,
|
||
|
|
title={Qwen2.5-Coder-32B-Glaive-ToolCall},
|
||
|
|
author={[RekklesAI]},
|
||
|
|
year={2025},
|
||
|
|
note={Fine-tuned version of Qwen2.5-Coder-32B-Instruct with enhanced tool calling capabilities using Glaive dataset}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## License
|
||
|
|
|
||
|
|
apache-2.0
|
||
|
|
|
||
|
|
## Acknowledgments
|
||
|
|
|
||
|
|
- **Qwen Team**: For the excellent base model Qwen2.5-Coder-32B-Instruct
|
||
|
|
- **Glaive**: For providing the high-quality tool calling dataset
|
||
|
|
- **LLaMA-Factory**: For the efficient fine-tuning framework
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
*This model card follows the guidelines for responsible AI model documentation and transparency.*
|