---
language:
- en
license: apache-2.0
base_model: Qwen/Qwen2.5-1.5B-Instruct
base_model_relation: finetune
library_name: transformers
pipeline_tag: text-generation
tags:
- genesis-agi
- manthan
- qwen2
- tool-calling
- agent
- reasoning
- grpo
- qlora
- chatml
- smolagents
datasets:
- Shahansha/manthan-tool-reasoning-v1
- glaiveai/glaive-function-calling-v2
- NousResearch/hermes-function-calling-v1
metrics:
- accuracy
- pass@1
model-index:
-
    name: Manthan-1.5B
    results:
    -
        task:
            type: text-generation
            name: Tool-Augmented Generation
        dataset:
            name: GSM8K
            type: gsm8k
        metrics:
        -
            name: Tool-Augmented Accuracy
            type: accuracy
            value: 65.0
    -
        task:
            type: text-generation
            name: Code Generation
        dataset:
            name: MBPP
            type: mbpp
        metrics:
        -
            name: pass@1
            type: pass@1
            value: 50.0
---

# Genesis Manthan - 1.5B

Genesis Manthan is a small language model fine-tuned to reason through tool interaction instead of verbal chain-of-thought. It is built on top of Qwen2.5-1.5B-Instruct and tuned for tool-first responses, agent workflows, and smolagents-style execution loops.

## Model Summary

- Base model: `Qwen/Qwen2.5-1.5B-Instruct`
- Published model: `Shahansha/Manthan-1.5B`
- Training recipe: QLoRA SFT -> GRPO with tool-execution rewards -> budget forcing at inference time
- Primary behavior: emit structured tool calls before final answers
- Intended ecosystem: Hugging Face Transformers, Gradio Spaces, smolagents, local agent runners

## Why this model exists

Most small open models still answer by generating verbose text, even when the task would be better solved through an external tool. Manthan is designed around a different behavior: call a tool, observe the result, and then answer. The target is not hidden verbal reasoning. The target is reliable action traces that small models can actually execute.

spaces:
  - Shahansha/Manthan-Demo
    
## Benchmark Snapshot

| Benchmark | Metric | Reported Result |
|---|---:|---:|
| GSM8K | Tool-augmented accuracy | 65.0 |
| MBPP | pass@1 | 50.0 |

*Reported benchmark numbers are early project metrics and should be independently reproduced before strong claims are made.

## Quickstart

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Shahansha/Manthan-1.5B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.float16,
    device_map="auto",
)
model.generation_config.max_length = None

messages = [
    {
        "role": "system",
        "content": (
            "You are Genesis Manthan, an AI agent that solves problems by calling tools. "
            "Never reason verbally - always reason through tool execution."
        ),
    },
    {"role": "user", "content": "What is 144 + 256?"},
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.2,
)

print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))
```

Expected behavior: the completion should include a `<tool_call>` block before the final answer.

## Prompting Guidance

This model performs best when the system prompt explicitly instructs it to solve problems by calling tools. If you omit that instruction, it may drift back toward plain-text assistant behavior.

Recommended system message:

```text
You are Genesis Manthan, an AI agent that solves problems by calling tools. Never reason verbally - always reason through tool execution.
```

## Training Details

- Base checkpoint: `Qwen/Qwen2.5-1.5B-Instruct`
- Fine-tuning method: QLoRA SFT
- Reinforcement learning: GRPO with composable rewards for tool execution, answer correctness, and format compliance
- Data format: ChatML with custom tool roles and structured `<tool_call>` blocks
- Primary training data: `Shahansha/manthan-tool-reasoning-v1` plus function-calling traces derived from Glaive and Hermes datasets

## Intended Use

- Agentic math and reasoning tasks where external execution is available
- Tool-augmented code and debugging workflows
- Research experiments around small-model tool use
- Gradio demos and Hugging Face Spaces showcasing action-first reasoning

## Limitations

- This is a research model, not a general factual authority
- Reported benchmark numbers are early project metrics and should be independently reproduced before strong claims are made
- The model relies heavily on the surrounding prompt and tool scaffolding
- Small models can still emit malformed tool calls or conclude too early without budget forcing or downstream validation

## Safety and Responsible Use

- Do not treat tool-call output as inherently safe to execute without sandboxing
- Validate JSON arguments and restrict available tools in production
- Review outputs carefully in coding, shell, or data-execution environments
- This model was not trained for high-stakes legal, medical, or safety-critical decisions

## Project Links

- Model: https://huggingface.co/Shahansha/Manthan-1.5B
- Dataset: https://huggingface.co/datasets/Shahansha/manthan-tool-reasoning-v1
- Code: https://github.com/shaik-shahansha/manthan
- Deployment guide: https://github.com/shaik-shahansha/manthan/blob/main/docs/HUGGINGFACE_DEPLOY.md
- Author: https://shahansha.com
- Org: https://genesisagi.in

## Citation

```bibtex
@misc{shaik2026manthan,
    title={Genesis Manthan-1.5B: Tool-Mediated Reasoning for Small Language Models},
    author={Shahansha Shaik},
    year={2026},
    url={https://huggingface.co/Shahansha/Manthan-1.5B}
}
```

---