limbic-tool-use-0.5B-32K/README.md

---
base_model: Qwen/Qwen2.5-0.5B-Instruct
tags:
- text-generation-inference
- transformers
- unsloth
- qwen2
license: apache-2.0
language:
- en
datasets:
- quotientai/limbic-eval-tool-use-mcp
---

# Limbic-Tool-Use MCP Function Call Evaluator

This model is a fine-tuned version of Qwen2.5-0.5B-Instruct specifically designed for evaluating function calls in the context of Model Context Protocol (MCP) tools. It can assess whether a function call is correct, uses the wrong tool, has incorrect parameter names, or has incorrect parameter values.

## Model Details

- **Base Model**: [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Task**: Function Call Evaluation for MCP (Model Context Protocol)
- **Training Data**: MCP Server Tools data from public MCP servers, with augmentation / synthetic data generation
- **Model Size**: ~40MB (LoRA adapters only)
- **Context Length**: 32,768 tokens

# Model Usage

## Model Prompts

The prompt for the model takes two inputs:
- `available_tools` - a list of the tool schemas
- `message_history` - the user request and model tool call response as a list of jsons

```
EVALUATOR_PROMPT = """\
# TOOL CALL EVALUATION RUBRIC

## EVALUATION CRITERIA

### 1. TOOL SELECTION
- [ ] Function name exists in available tools
- [ ] Function purpose matches user intent

### 2. PARAMETER STRUCTURE
- [ ] All required and relevant parameters are present
- [ ] No hallucinated parameter names
- [ ] Parameter names match tool schema exactly

### 3. PARAMETER VALUES
- [ ] Data types match expected types
- [ ] Values align with user request
- [ ] No fabricated or incorrect values

## CLASSIFICATION RULES
- All criteria passed → `correct`
- Failed criteria 1 → `incorrect_tool`
- Failed criteria 2 → `incorrect_parameter_names`
- Failed criteria 3 → `incorrect_parameter_values`

---
### AVAILABLE TOOLS
{available_tools}

---
### MESSAGE HISTORY
{message_history}

---
## OUTPUT REQUIREMENT
{{
    "score": < correct | incorrect_tool | incorrect_parameter_names | incorrect_parameter_values >,
    "reason": < [if incorrect, provide a brief list of reasons] >
}}

### EVALUATION:
"""
```
```
SYSTEM_PROMPT = "You are an expert evaluator of function calls. You will be given a function call and a list of available tools. You will need to evaluate the function call and return a score and a reason for the score."
```

### Example Inputs
```
available_tools = [
    {
        "name": "google-play-developer",
        "description": "Get apps by a developer on Google Play",
        "input_schema": {
            "type": "object",
            "properties": {
                "devId": {"type": "string", "description": "Developer ID"},
                "num": {"type": "number", "default": 60, "description": "Number of results"},
                "lang": {"type": "string", "default": "en", "description": "Language code"},
                "country": {"type": "string", "default": "us", "description": "Country code"}
            },
            "required": ["devId"]
        }
    }
]

message_history = [
    {"role": "user", "content": "I'm looking to evaluate the performance of all the apps developed by 'Example Developer' on the Google Play Store. Could you provide me with a list of their recent applications, specifically in English and focused on the US market? Please limit the results to 50 apps for a quicker review."},
    {"role": "assistant", "content": {"function": "name": "google-play-developer", "arguments": {"devId": "com.example.developer", "num": 50, "lang": "en", "country": "us"}}}
]
```

## Output Format
The model outputs evaluations in JSON format:

```json
{
    "score": "correct|incorrect_tool|incorrect_parameter_names|incorrect_parameter_values",
    "reason": ["reasons for failure if incorrect"]
}
```

#### Score Categories

- **correct**: Function call matches available tools and parameters exactly
- **incorrect_tool**: Function name doesn't exist in available tools
- **incorrect_parameter_names**: Function exists but parameter names are wrong
- **incorrect_parameter_values**: Function and parameters exist but values are inappropriate


## Load the Model
```
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("quotientai/limbic-tool-use-0.5B-32K")
model = AutoModelForCausalLM.from_pretrained("quotientai/limbic-tool-use-0.5B-32K")
```

## Generate a Prediction
To make a prediction, you must convert the formatted prompt into its chat format.
```
chat_template = [
  {"role": "system", "content": SYSTEM_PROMPT},
  {"role": "user", "content": "<your-formatted-user-prompt>"}
]
# Apply the chat template
text = tokenizer.apply_chat_template(chat_template, tokenize=False, add_generation_prompt=True)

# Tokenize with truncation
inputs = tokenizer(text, return_tensors="pt", truncation=True).to("cuda")

# Generate your prediction
result = model.generate(**inputs, max_new_tokens=128, use_cache=True)
```

## Citation
```bibtex
@model{limbic-tool-use-0.5B-32K,
  title={Limbic Tool Use Evaluator},
  author={QuotientAI},
  year={2025},
  url={https://huggingface.co/quotientai/limbic-tool-use-0.5B-32K}
}
```