kimi-k2/README.md

---
language:
- en
license: apache-2.0
tags:
- llm
- tool-calling
- lightweight
- agentic-tasks
- react
- mlx
model-index:
- name: NanoAgent
  results: []
datasets:
- microsoft/orca-agentinstruct-1M-v1
- microsoft/orca-math-word-problems-200k
- allenai/tulu-3-sft-personas-instruction-following
- xingyaoww/code-act
- m-a-p/Code-Feedback
- weijie210/gsm8k_decomposed
- Locutusque/function-calling-chatml
- HuggingFaceTB/smoltalk
base_model:
- HuggingFaceTB/SmolLM2-135M-Instruct
pipeline_tag: text-generation
---
# POC

# FORKED FROM
# 🧠 NanoAgent — 135M Parameter Agentic LLM

NanoAgent is a compact 135M parameter, 8k context-length language model trained to **perform tool calls** and **generate responses based on tool outputs**.  
Despite its small size (~135 MB in 8-bit precision), it’s optimized for agentic use cases and runs easily on personal devices.

**Github:** [NanoAgent](https://github.com/QuwsarOhi/NanoAgent)

**Inference resource:** [link](https://github.com/QuwsarOhi/NanoAgent/blob/main/notebooks/inference.ipynb)

---

## ✨ Features

- 🧰 **Tool Calling** — understands and responds with structured outputs from tool calls.  
- 🧭 **Instruction Following** — strong instruction following abilities.  
- 🧠 **Basic Reasoning** — handles lightweight reasoning and ReAct-style interactions.  
- ⚡ **Lightweight** — runs on local hardware with minimal resources.

---

## 🧪 Training Overview

**Base model:** [`SmolLM2-135M-Instruct`](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct)  
**Fine-tuning method:** [Dynamic Fine-Tuning (DFT)](https://github.com/yongliang-wu/DFT/tree/master)  
**Hardware:** Apple Mac M1 (16 GB Unified Memory) using MLX.

### 📚 Datasets Used
- `microsoft/orca-agentinstruct-1M-v1` — agentic tasks, RAG answers, classification  
- `microsoft/orca-math-word-problems-200k` — lightweight reasoning  
- `allenai/tulu-3-sft-personas-instruction-following` — instruction following  
- `xingyaoww/code-act` — ReAct style reasoning and action  
- `m-a-p/Code-Feedback` — alignment via feedback  
- `HuggingFaceTB/smoltalk` + `/apigen` — tool calling stabilization  
- `weijie210/gsm8k_decomposed` — question decomposition  
- `Locutusque/function-calling-chatml` — tool call response structure

---

## ⚠️ Disclaimer

This is a **beta model**.  
- It may produce **incorrect** or **incomplete** outputs.  
- Tool call execution is **basic** and can fail in some cases.  
- Intended for **research and experimentation** only — not production use.

---

## 🧭 Roadmap

- ✅ Initial release with DFT fine-tuning  
- 🧪 Benchmarking on agentic tasks  
- ~~🔬 Experimenting with GRPO for tool calling (failed)~~
- 🧠 Weight merging experiments for improved performance
- Add more tool calling dataset

---

## 📥 Model Size

- 135M parameters  
- ~135 MB in 8-bit precision  
- 8k context length

---

## ⚡ Example Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "quwsarohi/NanoAgent-135M"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

def inference(messages, max_new_tokens=256, temperature=0.3, min_p=0.15, **kwargs):
    input_text = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    inputs = tokenizer.encode(input_text, return_tensors="pt")
    outputs = model.generate(
        inputs,
        max_new_tokens=max_new_tokens,
        do_sample=True,
        min_p=0.15,
        temperature=temperature,
        **kwargs
    )
    return tokenizer.decode(outputs[0][inputs.shape[1] :], skip_special_tokens=True)

messages = [{"role": "user", "content": "Hi! Do you have a name?"}]
print(inference(messages))
```

Use the following template for tool calling:
```python
TOOL_TEMPLATE = """You are a helpful AI assistant. You have a set of possible functions/tools inside <tools></tools> tags. 
Based on question, you may need to make one or more function/tool calls to answer user.

You have access to the following tools/functions:
<tools>{tools}</tools>

For each function call, return a JSON list object with function name and arguments within <tool_call></tool_call> tags."""
```

Sample tool call definition:
```json
{
  "name": "web_search",
  "description": "Performs a web search for a query and returns a string of the top search results formatted as markdown with titles, links, and descriptions.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "The search query to perform.",
      }
    },
    "required": ["query"],
  },
}
```
-												初始化项目，由ModelHub XC社区提供模型

Model: applexml/kimi-k2
Source: Original Platform

											
										
										
											2026-06-01 01:32:27 +08:00
+								---
 								language:
 								- en
 								license: apache-2.0
 								tags:
 								- llm
 								- tool-calling
 								- lightweight
 								- agentic-tasks
 								- react
 								- mlx
 								model-index:
 								- name: NanoAgent
 								  results: []
 								datasets:
 								- microsoft/orca-agentinstruct-1M-v1
 								- microsoft/orca-math-word-problems-200k
 								- allenai/tulu-3-sft-personas-instruction-following
 								- xingyaoww/code-act
 								- m-a-p/Code-Feedback
 								- weijie210/gsm8k_decomposed
 								- Locutusque/function-calling-chatml
 								- HuggingFaceTB/smoltalk
 								base_model:
 								- HuggingFaceTB/SmolLM2-135M-Instruct
 								pipeline_tag: text-generation
 								---
 								# POC
 								# FORKED FROM
 								# 🧠 NanoAgent — 135M Parameter Agentic LLM
 								NanoAgent is a compact 135M parameter, 8k context-length language model trained to **perform tool calls** and **generate responses based on tool outputs**.
 								Despite its small size (~135 MB in 8-bit precision), it’s optimized for agentic use cases and runs easily on personal devices.
 								**Github:** [NanoAgent](https://github.com/QuwsarOhi/NanoAgent)
 								**Inference resource:** [link](https://github.com/QuwsarOhi/NanoAgent/blob/main/notebooks/inference.ipynb)
 								---
 								## ✨ Features
 								- 🧰 **Tool Calling** — understands and responds with structured outputs from tool calls.
 								- 🧭 **Instruction Following** — strong instruction following abilities.
 								- 🧠 **Basic Reasoning** — handles lightweight reasoning and ReAct-style interactions.
 								- ⚡ **Lightweight** — runs on local hardware with minimal resources.
 								---
 								## 🧪 Training Overview
 								**Base model:** [`SmolLM2-135M-Instruct`](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct)
 								**Fine-tuning method:** [Dynamic Fine-Tuning (DFT)](https://github.com/yongliang-wu/DFT/tree/master)
 								**Hardware:** Apple Mac M1 (16 GB Unified Memory) using MLX.
 								### 📚 Datasets Used
 								- `microsoft/orca-agentinstruct-1M-v1` — agentic tasks, RAG answers, classification
 								- `microsoft/orca-math-word-problems-200k` — lightweight reasoning
 								- `allenai/tulu-3-sft-personas-instruction-following` — instruction following
 								- `xingyaoww/code-act` — ReAct style reasoning and action
 								- `m-a-p/Code-Feedback` — alignment via feedback
 								- `HuggingFaceTB/smoltalk` + `/apigen` — tool calling stabilization
 								- `weijie210/gsm8k_decomposed` — question decomposition
 								- `Locutusque/function-calling-chatml` — tool call response structure
 								---
 								## ⚠️ Disclaimer
 								This is a **beta model**.
 								- It may produce **incorrect** or **incomplete** outputs.
 								- Tool call execution is **basic** and can fail in some cases.
 								- Intended for **research and experimentation** only — not production use.
 								---
 								## 🧭 Roadmap
 								- ✅ Initial release with DFT fine-tuning
 								- 🧪 Benchmarking on agentic tasks
 								- ~~🔬 Experimenting with GRPO for tool calling (failed)~~
 								- 🧠 Weight merging experiments for improved performance
 								- Add more tool calling dataset
 								---
 								## 📥 Model Size
 								- 135M parameters
 								- ~135 MB in 8-bit precision
 								- 8k context length
 								---
 								## ⚡ Example Usage
 								```python
 								from transformers import AutoModelForCausalLM, AutoTokenizer
 								model_name = "quwsarohi/NanoAgent-135M"
 								tokenizer = AutoTokenizer.from_pretrained(model_name)
 								model = AutoModelForCausalLM.from_pretrained(model_name)
 								def inference(messages, max_new_tokens=256, temperature=0.3, min_p=0.15, **kwargs):
 								    input_text = tokenizer.apply_chat_template(
 								        messages, tokenize=False, add_generation_prompt=True
 								    )
 								    inputs = tokenizer.encode(input_text, return_tensors="pt")
 								    outputs = model.generate(
 								        inputs,
 								        max_new_tokens=max_new_tokens,
 								        do_sample=True,
 								        min_p=0.15,
 								        temperature=temperature,
 								        **kwargs
 								    )
 								    return tokenizer.decode(outputs[0][inputs.shape[1] :], skip_special_tokens=True)
 								messages = [{"role": "user", "content": "Hi! Do you have a name?"}]
 								print(inference(messages))
 								```
 								Use the following template for tool calling:
 								```python
 								TOOL_TEMPLATE = """You are a helpful AI assistant. You have a set of possible functions/tools inside <tools></tools> tags.
 								Based on question, you may need to make one or more function/tool calls to answer user.
 								You have access to the following tools/functions:
 								<tools>{tools}</tools>
 								For each function call, return a JSON list object with function name and arguments within <tool_call></tool_call> tags."""
 								```
 								Sample tool call definition:
 								```json
 								{
 								  "name": "web_search",
 								  "description": "Performs a web search for a query and returns a string of the top search results formatted as markdown with titles, links, and descriptions.",
 								  "parameters": {
 								    "type": "object",
 								    "properties": {
 								      "query": {
 								        "type": "string",
 								        "description": "The search query to perform.",
 								      }
 								    },
 								    "required": ["query"],
 								  },
 								}
 								```