Plano-Orchestrator-4B/README.md

---
license: other
license_name: katanemo-research
license_link: >-
  https://huggingface.co/katanemo/Plano-Orchestrator-4B/blob/main/LICENSE
base_model:
- Qwen/Qwen3-4B-Instruct-2507
language:
- en
pipeline_tag: text-generation
---
# katanemo/Plano-Orchestrator-4B

## Overview

**Plano-Orchestrator** is a family of state-of-the-art routing and orchestration models that decide which agent(s) or LLM(s) should handle each request, and in what sequence. Built for multi-agent orchestration systems, Plano-Orchestrator excels at analyzing user intent and conversation context to make precise routing and orchestration decisions. Designed for real-world deployments, it delivers strong performance across general conversations, coding tasks, and long-context multi-turn conversations, while remaining efficient enough for low-latency production environments. 

#### Key capabilities
- **Multi-turn Context Understanding**: Makes routing decisions based on full conversation history, maintaining contextual awareness across extended dialogues with evolving user needs.
- **Multi-intent Detection**: Identifies when a single user message requires multiple agents simultaneously, enabling parallel/sequential routing to fulfill complex requests.
- **Context-dependent Routing**: Correctly interprets ambiguous or referential messages by leveraging prior conversation context for accurate routing decisions.
- **Conversational Flow Handling**: Understands diverse interaction patterns including follow-ups, clarifications, confirmations, and corrections within ongoing conversations.
- **Negative Case Detection**: Recognizes when no specialized routing is needed, avoiding unnecessary LLM or agent calls for casual conversation.

## Benchmark

We evaluate on **1,958 user messages** across **605 multi-turn conversations** with more than **130 different agents**, covering three scenarios:

- **General** (1,438 messages): Everyday conversational queries spanning diverse topics and agent types
- **Coding** (285 messages): Development-focused conversations including debugging, code generation, and technical assistance
- **Long-context** (235 messages): Extended conversations requiring understanding of extensive prior context

 Each message is annotated with routing-relevant attributes, including not limited to intent multiplicity, context dependency, and continuation type. Below is the evaluation 
 result.

<div align="center">
  <img width="100%" height="auto" src="./assets/Plano-Orchestrator.png"></a>
</div>

> [!NOTE]
> For evaluation, please note that all models were evaluated with minimal reasoning to ensure routing remains efficient.

## Example

```python
import json
import torch

from transformers import AutoTokenizer, AutoModelForCausalLM


ORCHESTRATION_PROMPT = (
    "You are a helpful assistant that selects the most suitable routes based on user intent.\n"
    "You are provided with a list of available routes enclosed within <routes></routes> XML tags:\n"
    "<routes>\n{routes}\n</routes>\n\n"
    "You are also given the conversation context enclosed within <conversation></conversation> XML tags:\n"
    "<conversation>\n{conversation}\n</conversation>\n\n"
    "## Instructions\n"
    "1. Analyze the latest user intent from the conversation.\n"
    "2. Compare it against the available routes to find which routes can help fulfill the request.\n"
    "3. Respond only with the exact route names from <routes>.\n"
    "4. If no routes can help or the intent is already fulfilled, return an empty list.\n\n"
    "## Response Format\n"
    "Return your answer strictly in JSON as follows:\n"
    '{{"route": ["route_name_1", "route_name_2", "..."]}}\n'
    "If no routes are needed, return an empty list for `route`."
)

def convert_agents_to_routes(agents):
    tools = [
        {
            "name": agent["name"],
            "description": agent["description"],
        }
        for agent in agents
    ]
    return "\n".join([json.dumps(tool, ensure_ascii=False) for tool in tools])

def build_messages(available_agents, conversation):
    routes = convert_agents_to_routes(available_agents)
    conversation_str = json.dumps(conversation, indent=4, ensure_ascii=False)
    prompt = ORCHESTRATION_PROMPT.format(routes=routes, conversation=conversation_str)
    return [{"role": "user", "content": prompt}]

# Load model
model_name = "katanemo/Plano-Orchestrator-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Define available agents
available_agents = [
    {"name": "WeatherAgent", "description": "Provides weather forecasts and current conditions for any location"},
    {"name": "CodeAgent", "description": "Generates, debugs, explains, and reviews code in multiple programming languages"}
]

# Conversation history
conversation = [
    {"role": "user", "content": "What's the weather like today?"},
    {"role": "assistant", "content": "I can help you with that. Could you tell me your location?"},
    {"role": "user", "content": "San Francisco"},
]

# Build messages and generate
model_inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt", return_dict=True
).to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=32768)
generated_ids = [
    output_ids[len(input_ids) :]
    for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
# Output: {"route": ["WeatherAgent"]}
```

## License

The Plano-Orchestrator collection is distributed under the [Katanemo license](https://huggingface.co/katanemo/Plano-Orchestrator-4B/blob/main/LICENSE).
初始化项目，由ModelHub XC社区提供模型 Model: katanemo/Plano-Orchestrator-4B Source: Original Platform 2026-05-21 01:16:17 +08:00			`---`
			`license: other`
			`license_name: katanemo-research`
			`license_link: >-`
			`https://huggingface.co/katanemo/Plano-Orchestrator-4B/blob/main/LICENSE`
			`base_model:`
			`- Qwen/Qwen3-4B-Instruct-2507`
			`language:`
			`- en`
			`pipeline_tag: text-generation`
			`---`
			`# katanemo/Plano-Orchestrator-4B`

			`## Overview`

			Plano-Orchestrator is a family of state-of-the-art routing and orchestration models that decide which agent(s) or LLM(s) should handle each request, and in what sequence. Built for multi-agent orchestration systems, Plano-Orchestrator excels at analyzing user intent and conversation context to make precise routing and orchestration decisions. Designed for real-world deployments, it delivers strong performance across general conversations, coding tasks, and long-context multi-turn conversations, while remaining efficient enough for low-latency production environments.

			`#### Key capabilities`
			`- Multi-turn Context Understanding: Makes routing decisions based on full conversation history, maintaining contextual awareness across extended dialogues with evolving user needs.`
			`- Multi-intent Detection: Identifies when a single user message requires multiple agents simultaneously, enabling parallel/sequential routing to fulfill complex requests.`
			`- Context-dependent Routing: Correctly interprets ambiguous or referential messages by leveraging prior conversation context for accurate routing decisions.`
			`- Conversational Flow Handling: Understands diverse interaction patterns including follow-ups, clarifications, confirmations, and corrections within ongoing conversations.`
			`- Negative Case Detection: Recognizes when no specialized routing is needed, avoiding unnecessary LLM or agent calls for casual conversation.`

			`## Benchmark`

			`We evaluate on 1,958 user messages across 605 multi-turn conversations with more than 130 different agents, covering three scenarios:`

			`- General (1,438 messages): Everyday conversational queries spanning diverse topics and agent types`
			`- Coding (285 messages): Development-focused conversations including debugging, code generation, and technical assistance`
			`- Long-context (235 messages): Extended conversations requiring understanding of extensive prior context`

			`Each message is annotated with routing-relevant attributes, including not limited to intent multiplicity, context dependency, and continuation type. Below is the evaluation`
			`result.`

			`<div align="center">`
			`<img width="100%" height="auto" src="./assets/Plano-Orchestrator.png"></a>`
			`</div>`

			`> [!NOTE]`
			`> For evaluation, please note that all models were evaluated with minimal reasoning to ensure routing remains efficient.`

			`## Example`

			```python
			`import json`
			`import torch`

			`from transformers import AutoTokenizer, AutoModelForCausalLM`


			`ORCHESTRATION_PROMPT = (`
			`"You are a helpful assistant that selects the most suitable routes based on user intent.\n"`
			`"You are provided with a list of available routes enclosed within <routes></routes> XML tags:\n"`
			`"<routes>\n{routes}\n</routes>\n\n"`
			`"You are also given the conversation context enclosed within <conversation></conversation> XML tags:\n"`
			`"<conversation>\n{conversation}\n</conversation>\n\n"`
			`"## Instructions\n"`
			`"1. Analyze the latest user intent from the conversation.\n"`
			`"2. Compare it against the available routes to find which routes can help fulfill the request.\n"`
			`"3. Respond only with the exact route names from <routes>.\n"`
			`"4. If no routes can help or the intent is already fulfilled, return an empty list.\n\n"`
			`"## Response Format\n"`
			`"Return your answer strictly in JSON as follows:\n"`
			`'{{"route": ["route_name_1", "route_name_2", "..."]}}\n'`
			"If no routes are needed, return an empty list for `route`."
			`)`

			`def convert_agents_to_routes(agents):`
			`tools = [`
			`{`
			`"name": agent["name"],`
			`"description": agent["description"],`
			`}`
			`for agent in agents`
			`]`
			`return "\n".join([json.dumps(tool, ensure_ascii=False) for tool in tools])`

			`def build_messages(available_agents, conversation):`
			`routes = convert_agents_to_routes(available_agents)`
			`conversation_str = json.dumps(conversation, indent=4, ensure_ascii=False)`
			`prompt = ORCHESTRATION_PROMPT.format(routes=routes, conversation=conversation_str)`
			`return [{"role": "user", "content": prompt}]`

			`# Load model`
			`model_name = "katanemo/Plano-Orchestrator-4B"`
			`tokenizer = AutoTokenizer.from_pretrained(model_name)`
			`model = AutoModelForCausalLM.from_pretrained(`
			`model_name,`
			`torch_dtype=torch.float16,`
			`device_map="auto"`
			`)`

			`# Define available agents`
			`available_agents = [`
			`{"name": "WeatherAgent", "description": "Provides weather forecasts and current conditions for any location"},`
			`{"name": "CodeAgent", "description": "Generates, debugs, explains, and reviews code in multiple programming languages"}`
			`]`

			`# Conversation history`
			`conversation = [`
			`{"role": "user", "content": "What's the weather like today?"},`
			`{"role": "assistant", "content": "I can help you with that. Could you tell me your location?"},`
			`{"role": "user", "content": "San Francisco"},`
			`]`

			`# Build messages and generate`
			`model_inputs = tokenizer.apply_chat_template(`
			`messages, add_generation_prompt=True, return_tensors="pt", return_dict=True`
			`).to(model.device)`

			`generated_ids = model.generate(**model_inputs, max_new_tokens=32768)`
			`generated_ids = [`
			`output_ids[len(input_ids) :]`
			`for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)`
			`]`

			`response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]`
			`print(response)`
			`# Output: {"route": ["WeatherAgent"]}`
			```

			`## License`

			`The Plano-Orchestrator collection is distributed under the [Katanemo license](https://huggingface.co/katanemo/Plano-Orchestrator-4B/blob/main/LICENSE).`