125 lines
5.7 KiB
Markdown
125 lines
5.7 KiB
Markdown
|
|
---
|
||
|
|
license: other
|
||
|
|
license_name: katanemo-research
|
||
|
|
license_link: >-
|
||
|
|
https://huggingface.co/katanemo/Plano-Orchestrator-4B/blob/main/LICENSE
|
||
|
|
base_model:
|
||
|
|
- Qwen/Qwen3-4B-Instruct-2507
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
---
|
||
|
|
# katanemo/Plano-Orchestrator-4B
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
**Plano-Orchestrator** is a family of state-of-the-art routing and orchestration models that decide which agent(s) or LLM(s) should handle each request, and in what sequence. Built for multi-agent orchestration systems, Plano-Orchestrator excels at analyzing user intent and conversation context to make precise routing and orchestration decisions. Designed for real-world deployments, it delivers strong performance across general conversations, coding tasks, and long-context multi-turn conversations, while remaining efficient enough for low-latency production environments.
|
||
|
|
|
||
|
|
#### Key capabilities
|
||
|
|
- **Multi-turn Context Understanding**: Makes routing decisions based on full conversation history, maintaining contextual awareness across extended dialogues with evolving user needs.
|
||
|
|
- **Multi-intent Detection**: Identifies when a single user message requires multiple agents simultaneously, enabling parallel/sequential routing to fulfill complex requests.
|
||
|
|
- **Context-dependent Routing**: Correctly interprets ambiguous or referential messages by leveraging prior conversation context for accurate routing decisions.
|
||
|
|
- **Conversational Flow Handling**: Understands diverse interaction patterns including follow-ups, clarifications, confirmations, and corrections within ongoing conversations.
|
||
|
|
- **Negative Case Detection**: Recognizes when no specialized routing is needed, avoiding unnecessary LLM or agent calls for casual conversation.
|
||
|
|
|
||
|
|
## Benchmark
|
||
|
|
|
||
|
|
We evaluate on **1,958 user messages** across **605 multi-turn conversations** with more than **130 different agents**, covering three scenarios:
|
||
|
|
|
||
|
|
- **General** (1,438 messages): Everyday conversational queries spanning diverse topics and agent types
|
||
|
|
- **Coding** (285 messages): Development-focused conversations including debugging, code generation, and technical assistance
|
||
|
|
- **Long-context** (235 messages): Extended conversations requiring understanding of extensive prior context
|
||
|
|
|
||
|
|
Each message is annotated with routing-relevant attributes, including not limited to intent multiplicity, context dependency, and continuation type. Below is the evaluation
|
||
|
|
result.
|
||
|
|
|
||
|
|
<div align="center">
|
||
|
|
<img width="100%" height="auto" src="./assets/Plano-Orchestrator.png"></a>
|
||
|
|
</div>
|
||
|
|
|
||
|
|
> [!NOTE]
|
||
|
|
> For evaluation, please note that all models were evaluated with minimal reasoning to ensure routing remains efficient.
|
||
|
|
|
||
|
|
## Example
|
||
|
|
|
||
|
|
```python
|
||
|
|
import json
|
||
|
|
import torch
|
||
|
|
|
||
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||
|
|
|
||
|
|
|
||
|
|
ORCHESTRATION_PROMPT = (
|
||
|
|
"You are a helpful assistant that selects the most suitable routes based on user intent.\n"
|
||
|
|
"You are provided with a list of available routes enclosed within <routes></routes> XML tags:\n"
|
||
|
|
"<routes>\n{routes}\n</routes>\n\n"
|
||
|
|
"You are also given the conversation context enclosed within <conversation></conversation> XML tags:\n"
|
||
|
|
"<conversation>\n{conversation}\n</conversation>\n\n"
|
||
|
|
"## Instructions\n"
|
||
|
|
"1. Analyze the latest user intent from the conversation.\n"
|
||
|
|
"2. Compare it against the available routes to find which routes can help fulfill the request.\n"
|
||
|
|
"3. Respond only with the exact route names from <routes>.\n"
|
||
|
|
"4. If no routes can help or the intent is already fulfilled, return an empty list.\n\n"
|
||
|
|
"## Response Format\n"
|
||
|
|
"Return your answer strictly in JSON as follows:\n"
|
||
|
|
'{{"route": ["route_name_1", "route_name_2", "..."]}}\n'
|
||
|
|
"If no routes are needed, return an empty list for `route`."
|
||
|
|
)
|
||
|
|
|
||
|
|
def convert_agents_to_routes(agents):
|
||
|
|
tools = [
|
||
|
|
{
|
||
|
|
"name": agent["name"],
|
||
|
|
"description": agent["description"],
|
||
|
|
}
|
||
|
|
for agent in agents
|
||
|
|
]
|
||
|
|
return "\n".join([json.dumps(tool, ensure_ascii=False) for tool in tools])
|
||
|
|
|
||
|
|
def build_messages(available_agents, conversation):
|
||
|
|
routes = convert_agents_to_routes(available_agents)
|
||
|
|
conversation_str = json.dumps(conversation, indent=4, ensure_ascii=False)
|
||
|
|
prompt = ORCHESTRATION_PROMPT.format(routes=routes, conversation=conversation_str)
|
||
|
|
return [{"role": "user", "content": prompt}]
|
||
|
|
|
||
|
|
# Load model
|
||
|
|
model_name = "katanemo/Plano-Orchestrator-4B"
|
||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
||
|
|
model_name,
|
||
|
|
torch_dtype=torch.float16,
|
||
|
|
device_map="auto"
|
||
|
|
)
|
||
|
|
|
||
|
|
# Define available agents
|
||
|
|
available_agents = [
|
||
|
|
{"name": "WeatherAgent", "description": "Provides weather forecasts and current conditions for any location"},
|
||
|
|
{"name": "CodeAgent", "description": "Generates, debugs, explains, and reviews code in multiple programming languages"}
|
||
|
|
]
|
||
|
|
|
||
|
|
# Conversation history
|
||
|
|
conversation = [
|
||
|
|
{"role": "user", "content": "What's the weather like today?"},
|
||
|
|
{"role": "assistant", "content": "I can help you with that. Could you tell me your location?"},
|
||
|
|
{"role": "user", "content": "San Francisco"},
|
||
|
|
]
|
||
|
|
|
||
|
|
# Build messages and generate
|
||
|
|
model_inputs = tokenizer.apply_chat_template(
|
||
|
|
messages, add_generation_prompt=True, return_tensors="pt", return_dict=True
|
||
|
|
).to(model.device)
|
||
|
|
|
||
|
|
generated_ids = model.generate(**model_inputs, max_new_tokens=32768)
|
||
|
|
generated_ids = [
|
||
|
|
output_ids[len(input_ids) :]
|
||
|
|
for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
||
|
|
]
|
||
|
|
|
||
|
|
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
||
|
|
print(response)
|
||
|
|
# Output: {"route": ["WeatherAgent"]}
|
||
|
|
```
|
||
|
|
|
||
|
|
## License
|
||
|
|
|
||
|
|
The Plano-Orchestrator collection is distributed under the [Katanemo license](https://huggingface.co/katanemo/Plano-Orchestrator-4B/blob/main/LICENSE).
|