172 lines
7.0 KiB
Markdown
172 lines
7.0 KiB
Markdown
|
|
---
|
|||
|
|
base_model:
|
|||
|
|
- Qwen/Qwen2.5-1.5B-Instruct
|
|||
|
|
language:
|
|||
|
|
- en
|
|||
|
|
library_name: transformers
|
|||
|
|
license: other
|
|||
|
|
license_name: katanemo-research
|
|||
|
|
license_link: https://huggingface.co/katanemo/Arch-Router-1.5B/blob/main/LICENSE
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
tags:
|
|||
|
|
- routing
|
|||
|
|
- preference
|
|||
|
|
- arxiv:2506.16655
|
|||
|
|
- llm
|
|||
|
|
paper: https://arxiv.org/abs/2506.16655
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# katanemo/Arch-Router-1.5B
|
|||
|
|
|
|||
|
|
## Overview
|
|||
|
|
With the rapid proliferation of large language models (LLMs) -- each optimized for different strengths, style, or latency/cost profile -- routing has become an essential technique to operationalize the use of different models. However, existing LLM routing approaches are limited in two key ways: they evaluate performance using benchmarks that often fail to capture human preferences driven by subjective evaluation criteria, and they typically select from a limited pool of models.
|
|||
|
|
|
|||
|
|
We introduce a preference-aligned routing framework that guides model selection by matching queries to user-defined domains (e.g., travel) or action types (e.g., image editing) -- offering a practical mechanism to encode preferences in routing decisions. Specifically, we introduce Arch-Router, a compact 1.5B model that learns to map queries to domain-action preferences for model routing decisions. Experiments on conversational datasets demonstrate that our approach achieves state-of-the-art (SOTA) results in matching queries with human preferences, outperforming top proprietary models.
|
|||
|
|
|
|||
|
|
This model is described in the paper: https://arxiv.org/abs/2506.16655, and powers [Arch](https://github.com/katanemo/arch) the models-native proxy server for agents.
|
|||
|
|
|
|||
|
|
### How It Works
|
|||
|
|
|
|||
|
|
To support effective routing, Arch-Router introduces two key concepts:
|
|||
|
|
- **Domain** – the high-level thematic category or subject matter of a request (e.g., legal, healthcare, programming).
|
|||
|
|
- **Action** – the specific type of operation the user wants performed (e.g., summarization, code generation, booking appointment, translation).
|
|||
|
|
|
|||
|
|
Both domain and action configs are associated with preferred models or model variants. At inference time, Arch-Router analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.
|
|||
|
|
|
|||
|
|
### Key Features
|
|||
|
|
|
|||
|
|
- **Structured Preference Routing**: Aligns prompt request with model strengths using explicit domain–action mappings.
|
|||
|
|
- **Transparent and Controllable**: Makes routing decisions transparent and configurable, empowering users to customize system behavior.
|
|||
|
|
- **Flexible and Adaptive**: Supports evolving user needs, model updates, and new domains/actions without retraining the router.
|
|||
|
|
- **Production-Ready Performance**: Optimized for low-latency, high-throughput applications in multi-model environments.
|
|||
|
|
|
|||
|
|
# Requirements
|
|||
|
|
The code of Arch-Router-1.5B has been in the Hugging Face `transformers` library and we advise you to install latest version:
|
|||
|
|
```bash
|
|||
|
|
pip install transformers>=4.37.0
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
# How to use
|
|||
|
|
We use the following example to illustrate how to use our model to perform routing tasks. Please note that, our model works best with our provided prompt format.
|
|||
|
|
### Quickstart
|
|||
|
|
````python
|
|||
|
|
import json
|
|||
|
|
from typing import Any, Dict, List
|
|||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|||
|
|
|
|||
|
|
model_name = "katanemo/Arch-Router-1.5B"
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True
|
|||
|
|
)
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|||
|
|
|
|||
|
|
# Please use our provided prompt for best performance
|
|||
|
|
TASK_INSTRUCTION = """
|
|||
|
|
You are a helpful assistant designed to find the best suited route.
|
|||
|
|
You are provided with route description within <routes></routes> XML tags:
|
|||
|
|
<routes>
|
|||
|
|
|
|||
|
|
{routes}
|
|||
|
|
|
|||
|
|
</routes>
|
|||
|
|
|
|||
|
|
<conversation>
|
|||
|
|
|
|||
|
|
{conversation}
|
|||
|
|
|
|||
|
|
</conversation>
|
|||
|
|
"""
|
|||
|
|
|
|||
|
|
FORMAT_PROMPT = """
|
|||
|
|
Your task is to decide which route is best suit with user intent on the conversation in <conversation></conversation> XML tags. Follow the instruction:
|
|||
|
|
1. If the latest intent from user is irrelevant or user intent is full filled, response with other route {"route": "other"}.
|
|||
|
|
2. You must analyze the route descriptions and find the best match route for user latest intent.
|
|||
|
|
3. You only response the name of the route that best matches the user's request, use the exact name in the <routes></routes>.
|
|||
|
|
|
|||
|
|
Based on your analysis, provide your response in the following JSON formats if you decide to match any route:
|
|||
|
|
{"route": "route_name"}
|
|||
|
|
"""
|
|||
|
|
|
|||
|
|
# Define route config
|
|||
|
|
route_config = [
|
|||
|
|
{
|
|||
|
|
"name": "code_generation",
|
|||
|
|
"description": "Generating new code snippets, functions, or boilerplate based on user prompts or requirements",
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"name": "bug_fixing",
|
|||
|
|
"description": "Identifying and fixing errors or bugs in the provided code across different programming languages",
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"name": "performance_optimization",
|
|||
|
|
"description": "Suggesting improvements to make code more efficient, readable, or scalable",
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"name": "api_help",
|
|||
|
|
"description": "Assisting with understanding or integrating external APIs and libraries",
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"name": "programming",
|
|||
|
|
"description": "Answering general programming questions, theory, or best practices",
|
|||
|
|
},
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
# Helper function to create the system prompt for our model
|
|||
|
|
def format_prompt(
|
|||
|
|
route_config: List[Dict[str, Any]], conversation: List[Dict[str, Any]]
|
|||
|
|
):
|
|||
|
|
return (
|
|||
|
|
TASK_INSTRUCTION.format(
|
|||
|
|
routes=json.dumps(route_config), conversation=json.dumps(conversation)
|
|||
|
|
)
|
|||
|
|
+ FORMAT_PROMPT
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# Define conversations
|
|||
|
|
|
|||
|
|
conversation = [
|
|||
|
|
{
|
|||
|
|
"role": "user",
|
|||
|
|
"content": "fix this module 'torch.utils._pytree' has no attribute 'register_pytree_node'. did you mean: '_register_pytree_node'?",
|
|||
|
|
}
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
route_prompt = format_prompt(route_config, conversation)
|
|||
|
|
|
|||
|
|
messages = [
|
|||
|
|
{"role": "user", "content": route_prompt},
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
input_ids = tokenizer.apply_chat_template(
|
|||
|
|
messages, add_generation_prompt=True, return_tensors="pt"
|
|||
|
|
).to(model.device)
|
|||
|
|
|
|||
|
|
# 2. Generate
|
|||
|
|
generated_ids = model.generate(
|
|||
|
|
input_ids=input_ids, # or just positional: model.generate(input_ids, …)
|
|||
|
|
max_new_tokens=32768,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# 3. Strip the prompt from each sequence
|
|||
|
|
prompt_lengths = input_ids.shape[1] # same length for every row here
|
|||
|
|
generated_only = [
|
|||
|
|
output_ids[prompt_lengths:] # slice off the prompt tokens
|
|||
|
|
for output_ids in generated_ids
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
# 4. Decode if you want text
|
|||
|
|
response = tokenizer.batch_decode(generated_only, skip_special_tokens=True)[0]
|
|||
|
|
print(response)
|
|||
|
|
````
|
|||
|
|
|
|||
|
|
Then you should be able to see the following output string in JSON format:
|
|||
|
|
````python
|
|||
|
|
{"route": "bug_fixing"}
|
|||
|
|
````
|
|||
|
|
|
|||
|
|
To better understand how to create the route descriptions, please take a look at our [Katanemo API](https://docs.archgw.com/guides/llm_router.html).
|
|||
|
|
|
|||
|
|
# License
|
|||
|
|
Katanemo Arch-Router model is distributed under the [Katanemo license](https://huggingface.co/katanemo/Arch-Router-1.5B/blob/main/LICENSE).
|
|||
|
|
|
|||
|
|
GitHub: https://github.com/katanemo/arch
|