151 lines
4.6 KiB
Markdown
151 lines
4.6 KiB
Markdown
|
|
---
|
|||
|
|
language:
|
|||
|
|
- en
|
|||
|
|
license: apache-2.0
|
|||
|
|
tags:
|
|||
|
|
- llm
|
|||
|
|
- tool-calling
|
|||
|
|
- lightweight
|
|||
|
|
- agentic-tasks
|
|||
|
|
- react
|
|||
|
|
- mlx
|
|||
|
|
model-index:
|
|||
|
|
- name: NanoAgent
|
|||
|
|
results: []
|
|||
|
|
datasets:
|
|||
|
|
- microsoft/orca-agentinstruct-1M-v1
|
|||
|
|
- microsoft/orca-math-word-problems-200k
|
|||
|
|
- allenai/tulu-3-sft-personas-instruction-following
|
|||
|
|
- xingyaoww/code-act
|
|||
|
|
- m-a-p/Code-Feedback
|
|||
|
|
- weijie210/gsm8k_decomposed
|
|||
|
|
- Locutusque/function-calling-chatml
|
|||
|
|
- HuggingFaceTB/smoltalk
|
|||
|
|
base_model:
|
|||
|
|
- HuggingFaceTB/SmolLM2-135M-Instruct
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
---
|
|||
|
|
# POC
|
|||
|
|
|
|||
|
|
# FORKED FROM
|
|||
|
|
# 🧠 NanoAgent — 135M Parameter Agentic LLM
|
|||
|
|
|
|||
|
|
NanoAgent is a compact 135M parameter, 8k context-length language model trained to **perform tool calls** and **generate responses based on tool outputs**.
|
|||
|
|
Despite its small size (~135 MB in 8-bit precision), it’s optimized for agentic use cases and runs easily on personal devices.
|
|||
|
|
|
|||
|
|
**Github:** [NanoAgent](https://github.com/QuwsarOhi/NanoAgent)
|
|||
|
|
|
|||
|
|
**Inference resource:** [link](https://github.com/QuwsarOhi/NanoAgent/blob/main/notebooks/inference.ipynb)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ✨ Features
|
|||
|
|
|
|||
|
|
- 🧰 **Tool Calling** — understands and responds with structured outputs from tool calls.
|
|||
|
|
- 🧭 **Instruction Following** — strong instruction following abilities.
|
|||
|
|
- 🧠 **Basic Reasoning** — handles lightweight reasoning and ReAct-style interactions.
|
|||
|
|
- ⚡ **Lightweight** — runs on local hardware with minimal resources.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🧪 Training Overview
|
|||
|
|
|
|||
|
|
**Base model:** [`SmolLM2-135M-Instruct`](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct)
|
|||
|
|
**Fine-tuning method:** [Dynamic Fine-Tuning (DFT)](https://github.com/yongliang-wu/DFT/tree/master)
|
|||
|
|
**Hardware:** Apple Mac M1 (16 GB Unified Memory) using MLX.
|
|||
|
|
|
|||
|
|
### 📚 Datasets Used
|
|||
|
|
- `microsoft/orca-agentinstruct-1M-v1` — agentic tasks, RAG answers, classification
|
|||
|
|
- `microsoft/orca-math-word-problems-200k` — lightweight reasoning
|
|||
|
|
- `allenai/tulu-3-sft-personas-instruction-following` — instruction following
|
|||
|
|
- `xingyaoww/code-act` — ReAct style reasoning and action
|
|||
|
|
- `m-a-p/Code-Feedback` — alignment via feedback
|
|||
|
|
- `HuggingFaceTB/smoltalk` + `/apigen` — tool calling stabilization
|
|||
|
|
- `weijie210/gsm8k_decomposed` — question decomposition
|
|||
|
|
- `Locutusque/function-calling-chatml` — tool call response structure
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ⚠️ Disclaimer
|
|||
|
|
|
|||
|
|
This is a **beta model**.
|
|||
|
|
- It may produce **incorrect** or **incomplete** outputs.
|
|||
|
|
- Tool call execution is **basic** and can fail in some cases.
|
|||
|
|
- Intended for **research and experimentation** only — not production use.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🧭 Roadmap
|
|||
|
|
|
|||
|
|
- ✅ Initial release with DFT fine-tuning
|
|||
|
|
- 🧪 Benchmarking on agentic tasks
|
|||
|
|
- ~~🔬 Experimenting with GRPO for tool calling (failed)~~
|
|||
|
|
- 🧠 Weight merging experiments for improved performance
|
|||
|
|
- Add more tool calling dataset
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📥 Model Size
|
|||
|
|
|
|||
|
|
- 135M parameters
|
|||
|
|
- ~135 MB in 8-bit precision
|
|||
|
|
- 8k context length
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ⚡ Example Usage
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|||
|
|
|
|||
|
|
model_name = "quwsarohi/NanoAgent-135M"
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(model_name)
|
|||
|
|
|
|||
|
|
def inference(messages, max_new_tokens=256, temperature=0.3, min_p=0.15, **kwargs):
|
|||
|
|
input_text = tokenizer.apply_chat_template(
|
|||
|
|
messages, tokenize=False, add_generation_prompt=True
|
|||
|
|
)
|
|||
|
|
inputs = tokenizer.encode(input_text, return_tensors="pt")
|
|||
|
|
outputs = model.generate(
|
|||
|
|
inputs,
|
|||
|
|
max_new_tokens=max_new_tokens,
|
|||
|
|
do_sample=True,
|
|||
|
|
min_p=0.15,
|
|||
|
|
temperature=temperature,
|
|||
|
|
**kwargs
|
|||
|
|
)
|
|||
|
|
return tokenizer.decode(outputs[0][inputs.shape[1] :], skip_special_tokens=True)
|
|||
|
|
|
|||
|
|
messages = [{"role": "user", "content": "Hi! Do you have a name?"}]
|
|||
|
|
print(inference(messages))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Use the following template for tool calling:
|
|||
|
|
```python
|
|||
|
|
TOOL_TEMPLATE = """You are a helpful AI assistant. You have a set of possible functions/tools inside <tools></tools> tags.
|
|||
|
|
Based on question, you may need to make one or more function/tool calls to answer user.
|
|||
|
|
|
|||
|
|
You have access to the following tools/functions:
|
|||
|
|
<tools>{tools}</tools>
|
|||
|
|
|
|||
|
|
For each function call, return a JSON list object with function name and arguments within <tool_call></tool_call> tags."""
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Sample tool call definition:
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"name": "web_search",
|
|||
|
|
"description": "Performs a web search for a query and returns a string of the top search results formatted as markdown with titles, links, and descriptions.",
|
|||
|
|
"parameters": {
|
|||
|
|
"type": "object",
|
|||
|
|
"properties": {
|
|||
|
|
"query": {
|
|||
|
|
"type": "string",
|
|||
|
|
"description": "The search query to perform.",
|
|||
|
|
}
|
|||
|
|
},
|
|||
|
|
"required": ["query"],
|
|||
|
|
},
|
|||
|
|
}
|
|||
|
|
```
|