初始化项目,由ModelHub XC社区提供模型
Model: katanemo/Plano-Orchestrator-4B Source: Original Platform
This commit is contained in:
125
README.md
Normal file
125
README.md
Normal file
@@ -0,0 +1,125 @@
|
||||
---
|
||||
license: other
|
||||
license_name: katanemo-research
|
||||
license_link: >-
|
||||
https://huggingface.co/katanemo/Plano-Orchestrator-4B/blob/main/LICENSE
|
||||
base_model:
|
||||
- Qwen/Qwen3-4B-Instruct-2507
|
||||
language:
|
||||
- en
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
# katanemo/Plano-Orchestrator-4B
|
||||
|
||||
## Overview
|
||||
|
||||
**Plano-Orchestrator** is a family of state-of-the-art routing and orchestration models that decide which agent(s) or LLM(s) should handle each request, and in what sequence. Built for multi-agent orchestration systems, Plano-Orchestrator excels at analyzing user intent and conversation context to make precise routing and orchestration decisions. Designed for real-world deployments, it delivers strong performance across general conversations, coding tasks, and long-context multi-turn conversations, while remaining efficient enough for low-latency production environments.
|
||||
|
||||
#### Key capabilities
|
||||
- **Multi-turn Context Understanding**: Makes routing decisions based on full conversation history, maintaining contextual awareness across extended dialogues with evolving user needs.
|
||||
- **Multi-intent Detection**: Identifies when a single user message requires multiple agents simultaneously, enabling parallel/sequential routing to fulfill complex requests.
|
||||
- **Context-dependent Routing**: Correctly interprets ambiguous or referential messages by leveraging prior conversation context for accurate routing decisions.
|
||||
- **Conversational Flow Handling**: Understands diverse interaction patterns including follow-ups, clarifications, confirmations, and corrections within ongoing conversations.
|
||||
- **Negative Case Detection**: Recognizes when no specialized routing is needed, avoiding unnecessary LLM or agent calls for casual conversation.
|
||||
|
||||
## Benchmark
|
||||
|
||||
We evaluate on **1,958 user messages** across **605 multi-turn conversations** with more than **130 different agents**, covering three scenarios:
|
||||
|
||||
- **General** (1,438 messages): Everyday conversational queries spanning diverse topics and agent types
|
||||
- **Coding** (285 messages): Development-focused conversations including debugging, code generation, and technical assistance
|
||||
- **Long-context** (235 messages): Extended conversations requiring understanding of extensive prior context
|
||||
|
||||
Each message is annotated with routing-relevant attributes, including not limited to intent multiplicity, context dependency, and continuation type. Below is the evaluation
|
||||
result.
|
||||
|
||||
<div align="center">
|
||||
<img width="100%" height="auto" src="./assets/Plano-Orchestrator.png"></a>
|
||||
</div>
|
||||
|
||||
> [!NOTE]
|
||||
> For evaluation, please note that all models were evaluated with minimal reasoning to ensure routing remains efficient.
|
||||
|
||||
## Example
|
||||
|
||||
```python
|
||||
import json
|
||||
import torch
|
||||
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
|
||||
ORCHESTRATION_PROMPT = (
|
||||
"You are a helpful assistant that selects the most suitable routes based on user intent.\n"
|
||||
"You are provided with a list of available routes enclosed within <routes></routes> XML tags:\n"
|
||||
"<routes>\n{routes}\n</routes>\n\n"
|
||||
"You are also given the conversation context enclosed within <conversation></conversation> XML tags:\n"
|
||||
"<conversation>\n{conversation}\n</conversation>\n\n"
|
||||
"## Instructions\n"
|
||||
"1. Analyze the latest user intent from the conversation.\n"
|
||||
"2. Compare it against the available routes to find which routes can help fulfill the request.\n"
|
||||
"3. Respond only with the exact route names from <routes>.\n"
|
||||
"4. If no routes can help or the intent is already fulfilled, return an empty list.\n\n"
|
||||
"## Response Format\n"
|
||||
"Return your answer strictly in JSON as follows:\n"
|
||||
'{{"route": ["route_name_1", "route_name_2", "..."]}}\n'
|
||||
"If no routes are needed, return an empty list for `route`."
|
||||
)
|
||||
|
||||
def convert_agents_to_routes(agents):
|
||||
tools = [
|
||||
{
|
||||
"name": agent["name"],
|
||||
"description": agent["description"],
|
||||
}
|
||||
for agent in agents
|
||||
]
|
||||
return "\n".join([json.dumps(tool, ensure_ascii=False) for tool in tools])
|
||||
|
||||
def build_messages(available_agents, conversation):
|
||||
routes = convert_agents_to_routes(available_agents)
|
||||
conversation_str = json.dumps(conversation, indent=4, ensure_ascii=False)
|
||||
prompt = ORCHESTRATION_PROMPT.format(routes=routes, conversation=conversation_str)
|
||||
return [{"role": "user", "content": prompt}]
|
||||
|
||||
# Load model
|
||||
model_name = "katanemo/Plano-Orchestrator-4B"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name,
|
||||
torch_dtype=torch.float16,
|
||||
device_map="auto"
|
||||
)
|
||||
|
||||
# Define available agents
|
||||
available_agents = [
|
||||
{"name": "WeatherAgent", "description": "Provides weather forecasts and current conditions for any location"},
|
||||
{"name": "CodeAgent", "description": "Generates, debugs, explains, and reviews code in multiple programming languages"}
|
||||
]
|
||||
|
||||
# Conversation history
|
||||
conversation = [
|
||||
{"role": "user", "content": "What's the weather like today?"},
|
||||
{"role": "assistant", "content": "I can help you with that. Could you tell me your location?"},
|
||||
{"role": "user", "content": "San Francisco"},
|
||||
]
|
||||
|
||||
# Build messages and generate
|
||||
model_inputs = tokenizer.apply_chat_template(
|
||||
messages, add_generation_prompt=True, return_tensors="pt", return_dict=True
|
||||
).to(model.device)
|
||||
|
||||
generated_ids = model.generate(**model_inputs, max_new_tokens=32768)
|
||||
generated_ids = [
|
||||
output_ids[len(input_ids) :]
|
||||
for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
||||
]
|
||||
|
||||
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
||||
print(response)
|
||||
# Output: {"route": ["WeatherAgent"]}
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
The Plano-Orchestrator collection is distributed under the [Katanemo license](https://huggingface.co/katanemo/Plano-Orchestrator-4B/blob/main/LICENSE).
|
||||
Reference in New Issue
Block a user