Manthan-1.5B/README.md

---
language:
- en
license: apache-2.0
base_model: Qwen/Qwen2.5-1.5B-Instruct
base_model_relation: finetune
library_name: transformers
pipeline_tag: text-generation
tags:
- genesis-agi
- manthan
- qwen2
- tool-calling
- agent
- reasoning
- grpo
- qlora
- chatml
- smolagents
datasets:
- Shahansha/manthan-tool-reasoning-v1
- glaiveai/glaive-function-calling-v2
- NousResearch/hermes-function-calling-v1
metrics:
- accuracy
- pass@1
model-index:
-
    name: Manthan-1.5B
    results:
    -
        task:
            type: text-generation
            name: Tool-Augmented Generation
        dataset:
            name: GSM8K
            type: gsm8k
        metrics:
        -
            name: Tool-Augmented Accuracy
            type: accuracy
            value: 65.0
    -
        task:
            type: text-generation
            name: Code Generation
        dataset:
            name: MBPP
            type: mbpp
        metrics:
        -
            name: pass@1
            type: pass@1
            value: 50.0
---

# Genesis Manthan - 1.5B

Genesis Manthan is a small language model fine-tuned to reason through tool interaction instead of verbal chain-of-thought. It is built on top of Qwen2.5-1.5B-Instruct and tuned for tool-first responses, agent workflows, and smolagents-style execution loops.

## Model Summary

- Base model: `Qwen/Qwen2.5-1.5B-Instruct`
- Published model: `Shahansha/Manthan-1.5B`
- Training recipe: QLoRA SFT -> GRPO with tool-execution rewards -> budget forcing at inference time
- Primary behavior: emit structured tool calls before final answers
- Intended ecosystem: Hugging Face Transformers, Gradio Spaces, smolagents, local agent runners

## Why this model exists

Most small open models still answer by generating verbose text, even when the task would be better solved through an external tool. Manthan is designed around a different behavior: call a tool, observe the result, and then answer. The target is not hidden verbal reasoning. The target is reliable action traces that small models can actually execute.

spaces:
  - Shahansha/Manthan-Demo
    
## Benchmark Snapshot

| Benchmark | Metric | Reported Result |
|---|---:|---:|
| GSM8K | Tool-augmented accuracy | 65.0 |
| MBPP | pass@1 | 50.0 |

*Reported benchmark numbers are early project metrics and should be independently reproduced before strong claims are made.

## Quickstart

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Shahansha/Manthan-1.5B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.float16,
    device_map="auto",
)
model.generation_config.max_length = None

messages = [
    {
        "role": "system",
        "content": (
            "You are Genesis Manthan, an AI agent that solves problems by calling tools. "
            "Never reason verbally - always reason through tool execution."
        ),
    },
    {"role": "user", "content": "What is 144 + 256?"},
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.2,
)

print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))
```

Expected behavior: the completion should include a `<tool_call>` block before the final answer.

## Prompting Guidance

This model performs best when the system prompt explicitly instructs it to solve problems by calling tools. If you omit that instruction, it may drift back toward plain-text assistant behavior.

Recommended system message:

```text
You are Genesis Manthan, an AI agent that solves problems by calling tools. Never reason verbally - always reason through tool execution.
```

## Training Details

- Base checkpoint: `Qwen/Qwen2.5-1.5B-Instruct`
- Fine-tuning method: QLoRA SFT
- Reinforcement learning: GRPO with composable rewards for tool execution, answer correctness, and format compliance
- Data format: ChatML with custom tool roles and structured `<tool_call>` blocks
- Primary training data: `Shahansha/manthan-tool-reasoning-v1` plus function-calling traces derived from Glaive and Hermes datasets

## Intended Use

- Agentic math and reasoning tasks where external execution is available
- Tool-augmented code and debugging workflows
- Research experiments around small-model tool use
- Gradio demos and Hugging Face Spaces showcasing action-first reasoning

## Limitations

- This is a research model, not a general factual authority
- Reported benchmark numbers are early project metrics and should be independently reproduced before strong claims are made
- The model relies heavily on the surrounding prompt and tool scaffolding
- Small models can still emit malformed tool calls or conclude too early without budget forcing or downstream validation

## Safety and Responsible Use

- Do not treat tool-call output as inherently safe to execute without sandboxing
- Validate JSON arguments and restrict available tools in production
- Review outputs carefully in coding, shell, or data-execution environments
- This model was not trained for high-stakes legal, medical, or safety-critical decisions

## Project Links

- Model: https://huggingface.co/Shahansha/Manthan-1.5B
- Dataset: https://huggingface.co/datasets/Shahansha/manthan-tool-reasoning-v1
- Code: https://github.com/shaik-shahansha/manthan
- Deployment guide: https://github.com/shaik-shahansha/manthan/blob/main/docs/HUGGINGFACE_DEPLOY.md
- Author: https://shahansha.com
- Org: https://genesisagi.in

## Citation

```bibtex
@misc{shaik2026manthan,
    title={Genesis Manthan-1.5B: Tool-Mediated Reasoning for Small Language Models},
    author={Shahansha Shaik},
    year={2026},
    url={https://huggingface.co/Shahansha/Manthan-1.5B}
}
```

---
初始化项目，由ModelHub XC社区提供模型 Model: Shahansha/Manthan-1.5B Source: Original Platform 2026-04-22 13:19:58 +08:00			`---`
			`language:`
			`- en`
			`license: apache-2.0`
			`base_model: Qwen/Qwen2.5-1.5B-Instruct`
			`base_model_relation: finetune`
			`library_name: transformers`
			`pipeline_tag: text-generation`
			`tags:`
			`- genesis-agi`
			`- manthan`
			`- qwen2`
			`- tool-calling`
			`- agent`
			`- reasoning`
			`- grpo`
			`- qlora`
			`- chatml`
			`- smolagents`
			`datasets:`
			`- Shahansha/manthan-tool-reasoning-v1`
			`- glaiveai/glaive-function-calling-v2`
			`- NousResearch/hermes-function-calling-v1`
			`metrics:`
			`- accuracy`
			`- pass@1`
			`model-index:`
			`-`
			`name: Manthan-1.5B`
			`results:`
			`-`
			`task:`
			`type: text-generation`
			`name: Tool-Augmented Generation`
			`dataset:`
			`name: GSM8K`
			`type: gsm8k`
			`metrics:`
			`-`
			`name: Tool-Augmented Accuracy`
			`type: accuracy`
			`value: 65.0`
			`-`
			`task:`
			`type: text-generation`
			`name: Code Generation`
			`dataset:`
			`name: MBPP`
			`type: mbpp`
			`metrics:`
			`-`
			`name: pass@1`
			`type: pass@1`
			`value: 50.0`
			`---`

			`# Genesis Manthan - 1.5B`

			`Genesis Manthan is a small language model fine-tuned to reason through tool interaction instead of verbal chain-of-thought. It is built on top of Qwen2.5-1.5B-Instruct and tuned for tool-first responses, agent workflows, and smolagents-style execution loops.`

			`## Model Summary`

			- Base model: `Qwen/Qwen2.5-1.5B-Instruct`
			- Published model: `Shahansha/Manthan-1.5B`
			`- Training recipe: QLoRA SFT -> GRPO with tool-execution rewards -> budget forcing at inference time`
			`- Primary behavior: emit structured tool calls before final answers`
			`- Intended ecosystem: Hugging Face Transformers, Gradio Spaces, smolagents, local agent runners`

			`## Why this model exists`

			`Most small open models still answer by generating verbose text, even when the task would be better solved through an external tool. Manthan is designed around a different behavior: call a tool, observe the result, and then answer. The target is not hidden verbal reasoning. The target is reliable action traces that small models can actually execute.`

			`spaces:`
			`- Shahansha/Manthan-Demo`

			`## Benchmark Snapshot`

			`\| Benchmark \| Metric \| Reported Result \|`
			`\|---\|---:\|---:\|`
			`\| GSM8K \| Tool-augmented accuracy \| 65.0 \|`
			`\| MBPP \| pass@1 \| 50.0 \|`

			`*Reported benchmark numbers are early project metrics and should be independently reproduced before strong claims are made.`

			`## Quickstart`

			```python
			`from transformers import AutoModelForCausalLM, AutoTokenizer`
			`import torch`

			`model_id = "Shahansha/Manthan-1.5B"`
			`tokenizer = AutoTokenizer.from_pretrained(model_id)`
			`model = AutoModelForCausalLM.from_pretrained(`
			`model_id,`
			`dtype=torch.float16,`
			`device_map="auto",`
			`)`
			`model.generation_config.max_length = None`

			`messages = [`
			`{`
			`"role": "system",`
			`"content": (`
			`"You are Genesis Manthan, an AI agent that solves problems by calling tools. "`
			`"Never reason verbally - always reason through tool execution."`
			`),`
			`},`
			`{"role": "user", "content": "What is 144 + 256?"},`
			`]`

			`prompt = tokenizer.apply_chat_template(`
			`messages,`
			`tokenize=False,`
			`add_generation_prompt=True,`
			`)`
			`inputs = tokenizer(prompt, return_tensors="pt").to(model.device)`
			`outputs = model.generate(`
			`**inputs,`
			`max_new_tokens=256,`
			`do_sample=True,`
			`temperature=0.2,`
			`)`

			`print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))`
			```

			Expected behavior: the completion should include a `<tool_call>` block before the final answer.

			`## Prompting Guidance`

			`This model performs best when the system prompt explicitly instructs it to solve problems by calling tools. If you omit that instruction, it may drift back toward plain-text assistant behavior.`

			`Recommended system message:`

			```text
			`You are Genesis Manthan, an AI agent that solves problems by calling tools. Never reason verbally - always reason through tool execution.`
			```

			`## Training Details`

			- Base checkpoint: `Qwen/Qwen2.5-1.5B-Instruct`
			`- Fine-tuning method: QLoRA SFT`
			`- Reinforcement learning: GRPO with composable rewards for tool execution, answer correctness, and format compliance`
			- Data format: ChatML with custom tool roles and structured `<tool_call>` blocks
			- Primary training data: `Shahansha/manthan-tool-reasoning-v1` plus function-calling traces derived from Glaive and Hermes datasets

			`## Intended Use`

			`- Agentic math and reasoning tasks where external execution is available`
			`- Tool-augmented code and debugging workflows`
			`- Research experiments around small-model tool use`
			`- Gradio demos and Hugging Face Spaces showcasing action-first reasoning`

			`## Limitations`

			`- This is a research model, not a general factual authority`
			`- Reported benchmark numbers are early project metrics and should be independently reproduced before strong claims are made`
			`- The model relies heavily on the surrounding prompt and tool scaffolding`
			`- Small models can still emit malformed tool calls or conclude too early without budget forcing or downstream validation`

			`## Safety and Responsible Use`

			`- Do not treat tool-call output as inherently safe to execute without sandboxing`
			`- Validate JSON arguments and restrict available tools in production`
			`- Review outputs carefully in coding, shell, or data-execution environments`
			`- This model was not trained for high-stakes legal, medical, or safety-critical decisions`

			`## Project Links`

			`- Model: https://huggingface.co/Shahansha/Manthan-1.5B`
			`- Dataset: https://huggingface.co/datasets/Shahansha/manthan-tool-reasoning-v1`
			`- Code: https://github.com/shaik-shahansha/manthan`
			`- Deployment guide: https://github.com/shaik-shahansha/manthan/blob/main/docs/HUGGINGFACE_DEPLOY.md`
			`- Author: https://shahansha.com`
			`- Org: https://genesisagi.in`

			`## Citation`

			```bibtex
			`@misc{shaik2026manthan,`
			`title={Genesis Manthan-1.5B: Tool-Mediated Reasoning for Small Language Models},`
			`author={Shahansha Shaik},`
			`year={2026},`
			`url={https://huggingface.co/Shahansha/Manthan-1.5B}`
			`}`
			```

			`---`