初始化项目,由ModelHub XC社区提供模型
Model: quotientai/limbic-tool-use-0.5B-32K Source: Original Platform
This commit is contained in:
160
README.md
Normal file
160
README.md
Normal file
@@ -0,0 +1,160 @@
|
||||
---
|
||||
base_model: Qwen/Qwen2.5-0.5B-Instruct
|
||||
tags:
|
||||
- text-generation-inference
|
||||
- transformers
|
||||
- unsloth
|
||||
- qwen2
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
datasets:
|
||||
- quotientai/limbic-eval-tool-use-mcp
|
||||
---
|
||||
|
||||
# Limbic-Tool-Use MCP Function Call Evaluator
|
||||
|
||||
This model is a fine-tuned version of Qwen2.5-0.5B-Instruct specifically designed for evaluating function calls in the context of Model Context Protocol (MCP) tools. It can assess whether a function call is correct, uses the wrong tool, has incorrect parameter names, or has incorrect parameter values.
|
||||
|
||||
## Model Details
|
||||
|
||||
- **Base Model**: [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)
|
||||
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
|
||||
- **Task**: Function Call Evaluation for MCP (Model Context Protocol)
|
||||
- **Training Data**: MCP Server Tools data from public MCP servers, with augmentation / synthetic data generation
|
||||
- **Model Size**: ~40MB (LoRA adapters only)
|
||||
- **Context Length**: 32,768 tokens
|
||||
|
||||
# Model Usage
|
||||
|
||||
## Model Prompts
|
||||
|
||||
The prompt for the model takes two inputs:
|
||||
- `available_tools` - a list of the tool schemas
|
||||
- `message_history` - the user request and model tool call response as a list of jsons
|
||||
|
||||
```
|
||||
EVALUATOR_PROMPT = """\
|
||||
# TOOL CALL EVALUATION RUBRIC
|
||||
|
||||
## EVALUATION CRITERIA
|
||||
|
||||
### 1. TOOL SELECTION
|
||||
- [ ] Function name exists in available tools
|
||||
- [ ] Function purpose matches user intent
|
||||
|
||||
### 2. PARAMETER STRUCTURE
|
||||
- [ ] All required and relevant parameters are present
|
||||
- [ ] No hallucinated parameter names
|
||||
- [ ] Parameter names match tool schema exactly
|
||||
|
||||
### 3. PARAMETER VALUES
|
||||
- [ ] Data types match expected types
|
||||
- [ ] Values align with user request
|
||||
- [ ] No fabricated or incorrect values
|
||||
|
||||
## CLASSIFICATION RULES
|
||||
- All criteria passed → `correct`
|
||||
- Failed criteria 1 → `incorrect_tool`
|
||||
- Failed criteria 2 → `incorrect_parameter_names`
|
||||
- Failed criteria 3 → `incorrect_parameter_values`
|
||||
|
||||
---
|
||||
### AVAILABLE TOOLS
|
||||
{available_tools}
|
||||
|
||||
---
|
||||
### MESSAGE HISTORY
|
||||
{message_history}
|
||||
|
||||
---
|
||||
## OUTPUT REQUIREMENT
|
||||
{{
|
||||
"score": < correct | incorrect_tool | incorrect_parameter_names | incorrect_parameter_values >,
|
||||
"reason": < [if incorrect, provide a brief list of reasons] >
|
||||
}}
|
||||
|
||||
### EVALUATION:
|
||||
"""
|
||||
```
|
||||
```
|
||||
SYSTEM_PROMPT = "You are an expert evaluator of function calls. You will be given a function call and a list of available tools. You will need to evaluate the function call and return a score and a reason for the score."
|
||||
```
|
||||
|
||||
### Example Inputs
|
||||
```
|
||||
available_tools = [
|
||||
{
|
||||
"name": "google-play-developer",
|
||||
"description": "Get apps by a developer on Google Play",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"devId": {"type": "string", "description": "Developer ID"},
|
||||
"num": {"type": "number", "default": 60, "description": "Number of results"},
|
||||
"lang": {"type": "string", "default": "en", "description": "Language code"},
|
||||
"country": {"type": "string", "default": "us", "description": "Country code"}
|
||||
},
|
||||
"required": ["devId"]
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
message_history = [
|
||||
{"role": "user", "content": "I'm looking to evaluate the performance of all the apps developed by 'Example Developer' on the Google Play Store. Could you provide me with a list of their recent applications, specifically in English and focused on the US market? Please limit the results to 50 apps for a quicker review."},
|
||||
{"role": "assistant", "content": {"function": "name": "google-play-developer", "arguments": {"devId": "com.example.developer", "num": 50, "lang": "en", "country": "us"}}}
|
||||
]
|
||||
```
|
||||
|
||||
## Output Format
|
||||
The model outputs evaluations in JSON format:
|
||||
|
||||
```json
|
||||
{
|
||||
"score": "correct|incorrect_tool|incorrect_parameter_names|incorrect_parameter_values",
|
||||
"reason": ["reasons for failure if incorrect"]
|
||||
}
|
||||
```
|
||||
|
||||
#### Score Categories
|
||||
|
||||
- **correct**: Function call matches available tools and parameters exactly
|
||||
- **incorrect_tool**: Function name doesn't exist in available tools
|
||||
- **incorrect_parameter_names**: Function exists but parameter names are wrong
|
||||
- **incorrect_parameter_values**: Function and parameters exist but values are inappropriate
|
||||
|
||||
|
||||
## Load the Model
|
||||
```
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("quotientai/limbic-tool-use-0.5B-32K")
|
||||
model = AutoModelForCausalLM.from_pretrained("quotientai/limbic-tool-use-0.5B-32K")
|
||||
```
|
||||
|
||||
## Generate a Prediction
|
||||
To make a prediction, you must convert the formatted prompt into its chat format.
|
||||
```
|
||||
chat_template = [
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": "<your-formatted-user-prompt>"}
|
||||
]
|
||||
# Apply the chat template
|
||||
text = tokenizer.apply_chat_template(chat_template, tokenize=False, add_generation_prompt=True)
|
||||
|
||||
# Tokenize with truncation
|
||||
inputs = tokenizer(text, return_tensors="pt", truncation=True).to("cuda")
|
||||
|
||||
# Generate your prediction
|
||||
result = model.generate(**inputs, max_new_tokens=128, use_cache=True)
|
||||
```
|
||||
|
||||
## Citation
|
||||
```bibtex
|
||||
@model{limbic-tool-use-0.5B-32K,
|
||||
title={Limbic Tool Use Evaluator},
|
||||
author={QuotientAI},
|
||||
year={2025},
|
||||
url={https://huggingface.co/quotientai/limbic-tool-use-0.5B-32K}
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user