5.1 KiB
5.1 KiB
base_model, tags, license, language, datasets
| base_model | tags | license | language | datasets | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Qwen/Qwen2.5-0.5B-Instruct |
|
apache-2.0 |
|
|
Limbic-Tool-Use MCP Function Call Evaluator
This model is a fine-tuned version of Qwen2.5-0.5B-Instruct specifically designed for evaluating function calls in the context of Model Context Protocol (MCP) tools. It can assess whether a function call is correct, uses the wrong tool, has incorrect parameter names, or has incorrect parameter values.
Model Details
- Base Model: Qwen/Qwen2.5-0.5B-Instruct
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Task: Function Call Evaluation for MCP (Model Context Protocol)
- Training Data: MCP Server Tools data from public MCP servers, with augmentation / synthetic data generation
- Model Size: ~40MB (LoRA adapters only)
- Context Length: 32,768 tokens
Model Usage
Model Prompts
The prompt for the model takes two inputs:
available_tools- a list of the tool schemasmessage_history- the user request and model tool call response as a list of jsons
EVALUATOR_PROMPT = """\
# TOOL CALL EVALUATION RUBRIC
## EVALUATION CRITERIA
### 1. TOOL SELECTION
- [ ] Function name exists in available tools
- [ ] Function purpose matches user intent
### 2. PARAMETER STRUCTURE
- [ ] All required and relevant parameters are present
- [ ] No hallucinated parameter names
- [ ] Parameter names match tool schema exactly
### 3. PARAMETER VALUES
- [ ] Data types match expected types
- [ ] Values align with user request
- [ ] No fabricated or incorrect values
## CLASSIFICATION RULES
- All criteria passed → `correct`
- Failed criteria 1 → `incorrect_tool`
- Failed criteria 2 → `incorrect_parameter_names`
- Failed criteria 3 → `incorrect_parameter_values`
---
### AVAILABLE TOOLS
{available_tools}
---
### MESSAGE HISTORY
{message_history}
---
## OUTPUT REQUIREMENT
{{
"score": < correct | incorrect_tool | incorrect_parameter_names | incorrect_parameter_values >,
"reason": < [if incorrect, provide a brief list of reasons] >
}}
### EVALUATION:
"""
SYSTEM_PROMPT = "You are an expert evaluator of function calls. You will be given a function call and a list of available tools. You will need to evaluate the function call and return a score and a reason for the score."
Example Inputs
available_tools = [
{
"name": "google-play-developer",
"description": "Get apps by a developer on Google Play",
"input_schema": {
"type": "object",
"properties": {
"devId": {"type": "string", "description": "Developer ID"},
"num": {"type": "number", "default": 60, "description": "Number of results"},
"lang": {"type": "string", "default": "en", "description": "Language code"},
"country": {"type": "string", "default": "us", "description": "Country code"}
},
"required": ["devId"]
}
}
]
message_history = [
{"role": "user", "content": "I'm looking to evaluate the performance of all the apps developed by 'Example Developer' on the Google Play Store. Could you provide me with a list of their recent applications, specifically in English and focused on the US market? Please limit the results to 50 apps for a quicker review."},
{"role": "assistant", "content": {"function": "name": "google-play-developer", "arguments": {"devId": "com.example.developer", "num": 50, "lang": "en", "country": "us"}}}
]
Output Format
The model outputs evaluations in JSON format:
{
"score": "correct|incorrect_tool|incorrect_parameter_names|incorrect_parameter_values",
"reason": ["reasons for failure if incorrect"]
}
Score Categories
- correct: Function call matches available tools and parameters exactly
- incorrect_tool: Function name doesn't exist in available tools
- incorrect_parameter_names: Function exists but parameter names are wrong
- incorrect_parameter_values: Function and parameters exist but values are inappropriate
Load the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("quotientai/limbic-tool-use-0.5B-32K")
model = AutoModelForCausalLM.from_pretrained("quotientai/limbic-tool-use-0.5B-32K")
Generate a Prediction
To make a prediction, you must convert the formatted prompt into its chat format.
chat_template = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "<your-formatted-user-prompt>"}
]
# Apply the chat template
text = tokenizer.apply_chat_template(chat_template, tokenize=False, add_generation_prompt=True)
# Tokenize with truncation
inputs = tokenizer(text, return_tensors="pt", truncation=True).to("cuda")
# Generate your prediction
result = model.generate(**inputs, max_new_tokens=128, use_cache=True)
Citation
@model{limbic-tool-use-0.5B-32K,
title={Limbic Tool Use Evaluator},
author={QuotientAI},
year={2025},
url={https://huggingface.co/quotientai/limbic-tool-use-0.5B-32K}
}