NanoAgent is a 135M parameter, 8k context length, open-source language model designed for agentic tasks such as tool calling, instruction following, and lightweight reasoning.
It’s small enough (~135 MB in 8-bit) to run on edge devices like personal laptops, low-memory CPUs, and even wearables — yet smart enough to make tool calls, parse web information, and give structured answers.
This model was trained using a combination of datasets under different open licenses.
Each dataset retains its original license, and use of those datasets is subject to their respective terms.
✂️Dataset deduplication significantly improved performance by removing noisy or duplicate Q/As.
✂️Shortening the responses (casual response) and using shorter python code in training improved performance and reduce repeated token generation.
🧮Word-level reasoning from orca-math enhanced the model’s ability to handle stepwise logic.
🧰 Designing tool calling prompts using six open-source tool calling datasets resulted in stronger structured output generation.
🌐 Tool calling integration enabled the model to extract answers from parsed web data, supporting up-to-date queries.
⚡ Benchmark
Model Comparison
Benchmark
SmolLM2-135M-Instruct
NanoAgent
Commonsense QA (acc)
20.88%
20.23%
IFEval (prompt strict)
21.63%
29.94%
IFEval (inst strict)
35.01%
42.33%
IFEval (prompt loose)
23.84%
32.16%
IFEval (inst loose)
37.65%
45.32%
tinyArc (acc_norm)
33.76%
36.47%
tinyGSM8k (exact_match)
0.55%
2.31%
tinyHellaswag (acc_norm)
42.20%
43.45%
tinyMMLU (acc_norm)
26.79%
27.62%
tinyTruthfulQA (acc)
38.65%
40.45%
tinyWinogrande (acc_norm)
46.48%
42.86%
BFCL Benchmark (Tool Calling)
Category
Accuracy
Correct/Total
Overall
28.99%
725/2501
parallel
56.50%
113/200
parallel_multiple
54.50%
109/200
simple_python
41.50%
166/400
simple_javascript
40.00%
20/50
multiple
31.50%
63/200
live_simple
28.29%
73/258
simple_java
27.00%
27/100
live_parallel
37.50%
6/16
live_parallel_multiple
25.00%
6/24
live_multiple
13.49%
142/1053
*All evaluations were conducted using greedy decoding (sampling parameter was set to false during HuggingFace inference).
Key Findings
NanoAgent significantly outperforms the base SmolLM2-135M-Instruct on instruction following (IFEval) with +8-10% improvements across all metrics
NanoAgent improves on tinyMMLU, tinyTruthfulQA, and tinyHellaswag over the base model
🧰Tool Calling: Only NanoAgent supports tool calling — SmolLM2-135M-Instruct does not
⚡ Example Usage
Basic Inference
fromtransformersimportAutoModelForCausalLM,AutoTokenizermodel_name="quwsarohi/NanoAgent-135M"tokenizer=AutoTokenizer.from_pretrained(model_name)model=AutoModelForCausalLM.from_pretrained(model_name,device_map="auto")definference(messages,max_new_tokens=256,temperature=0.3,**kwargs):ifisinstance(message,list):input_text=tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True)inputs=tokenizer.encode(input_text,return_tensors="pt").to(model.device)outputs=model.generate(inputs,max_new_tokens=max_new_tokens,do_sample=True,temperature=temperature,**kwargs)returntokenizer.decode(outputs[0][inputs.shape[1]:],skip_special_tokens=True)messages=[{"role":"user","content":"Hi! Do you have a name?"}]print(inference(messages))
Tool Calling
NanoAgent uses a JSON-based tool calling format:
importjsontools=[{"type":"function","function":{"name":"web_search","description":"Performs a web search and returns formatted results.","parameters":{"type":"object","properties":{"query":{"type":"string","description":"The search query."}},"required":["query"],},}}]TOOL_TEMPLATE="""You are a helpful AI assistant. You have a set of possible tools that you can execute to retrieve information or to perform specific actions. You can execute zero or more tools to answer user question.
Here are the list of tools that you have access to:
```json
{tools}```
Only execute tools from above. Follow the below JSON signature to execute tools:
```json
[{{"name": "tool_name", "arguments": {{"arg1": "val1", ...}}}}, ...]
```
"""messages=[{"role":"system","content":TOOL_TEMPLATE.format(tools=json.dumps(tools,indent=2))},{"role":"user","content":"What's the latest AI news?"},]response=inference(messages,max_new_tokens=512)print(response)# Output: ```json# [{"name": "web_search", "arguments": {"query": "latest AI news 2026"}}]# ```
It is suggested to add '''json\n tokens as prefill during inference. This shows improved performance as LLM knows it has to execute a tool.
messages=[{"role":"system","content":TOOL_TEMPLATE.format(tools=json.dumps(tools,indent=2))},{"role":"user","content":"What's the latest AI news?"},{"role":"assistant","content":"```json\n"}]input_text=tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=False,continue_final_message=True)response=inference(input_text,max_new_tokens=512)print(response)# Output: [{"name": "web_search", "arguments": {"query": "latest AI news 2026"}}]# ```
🧭 Roadmap
📊 Benchmark more agentic tasks
🧠 Explore GRPO for tool calling improvement
🔀 Experiment with weight merging
🧪 Evaluate multi-turn tool chaining
🧹 Further refine datasets for stability
📄 License
This project (code, model weights, and training recipes) is licensed under the Apache License 2.0.