--- language: - en license: apache-2.0 base_model: Qwen/Qwen2.5-1.5B-Instruct base_model_relation: finetune library_name: transformers pipeline_tag: text-generation tags: - genesis-agi - manthan - qwen2 - tool-calling - agent - reasoning - grpo - qlora - chatml - smolagents datasets: - Shahansha/manthan-tool-reasoning-v1 - glaiveai/glaive-function-calling-v2 - NousResearch/hermes-function-calling-v1 metrics: - accuracy - pass@1 model-index: - name: Manthan-1.5B results: - task: type: text-generation name: Tool-Augmented Generation dataset: name: GSM8K type: gsm8k metrics: - name: Tool-Augmented Accuracy type: accuracy value: 65.0 - task: type: text-generation name: Code Generation dataset: name: MBPP type: mbpp metrics: - name: pass@1 type: pass@1 value: 50.0 --- # Genesis Manthan - 1.5B Genesis Manthan is a small language model fine-tuned to reason through tool interaction instead of verbal chain-of-thought. It is built on top of Qwen2.5-1.5B-Instruct and tuned for tool-first responses, agent workflows, and smolagents-style execution loops. ## Model Summary - Base model: `Qwen/Qwen2.5-1.5B-Instruct` - Published model: `Shahansha/Manthan-1.5B` - Training recipe: QLoRA SFT -> GRPO with tool-execution rewards -> budget forcing at inference time - Primary behavior: emit structured tool calls before final answers - Intended ecosystem: Hugging Face Transformers, Gradio Spaces, smolagents, local agent runners ## Why this model exists Most small open models still answer by generating verbose text, even when the task would be better solved through an external tool. Manthan is designed around a different behavior: call a tool, observe the result, and then answer. The target is not hidden verbal reasoning. The target is reliable action traces that small models can actually execute. spaces: - Shahansha/Manthan-Demo ## Benchmark Snapshot | Benchmark | Metric | Reported Result | |---|---:|---:| | GSM8K | Tool-augmented accuracy | 65.0 | | MBPP | pass@1 | 50.0 | *Reported benchmark numbers are early project metrics and should be independently reproduced before strong claims are made. ## Quickstart ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "Shahansha/Manthan-1.5B" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, dtype=torch.float16, device_map="auto", ) model.generation_config.max_length = None messages = [ { "role": "system", "content": ( "You are Genesis Manthan, an AI agent that solves problems by calling tools. " "Never reason verbally - always reason through tool execution." ), }, {"role": "user", "content": "What is 144 + 256?"}, ] prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=256, do_sample=True, temperature=0.2, ) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)) ``` Expected behavior: the completion should include a `` block before the final answer. ## Prompting Guidance This model performs best when the system prompt explicitly instructs it to solve problems by calling tools. If you omit that instruction, it may drift back toward plain-text assistant behavior. Recommended system message: ```text You are Genesis Manthan, an AI agent that solves problems by calling tools. Never reason verbally - always reason through tool execution. ``` ## Training Details - Base checkpoint: `Qwen/Qwen2.5-1.5B-Instruct` - Fine-tuning method: QLoRA SFT - Reinforcement learning: GRPO with composable rewards for tool execution, answer correctness, and format compliance - Data format: ChatML with custom tool roles and structured `` blocks - Primary training data: `Shahansha/manthan-tool-reasoning-v1` plus function-calling traces derived from Glaive and Hermes datasets ## Intended Use - Agentic math and reasoning tasks where external execution is available - Tool-augmented code and debugging workflows - Research experiments around small-model tool use - Gradio demos and Hugging Face Spaces showcasing action-first reasoning ## Limitations - This is a research model, not a general factual authority - Reported benchmark numbers are early project metrics and should be independently reproduced before strong claims are made - The model relies heavily on the surrounding prompt and tool scaffolding - Small models can still emit malformed tool calls or conclude too early without budget forcing or downstream validation ## Safety and Responsible Use - Do not treat tool-call output as inherently safe to execute without sandboxing - Validate JSON arguments and restrict available tools in production - Review outputs carefully in coding, shell, or data-execution environments - This model was not trained for high-stakes legal, medical, or safety-critical decisions ## Project Links - Model: https://huggingface.co/Shahansha/Manthan-1.5B - Dataset: https://huggingface.co/datasets/Shahansha/manthan-tool-reasoning-v1 - Code: https://github.com/shaik-shahansha/manthan - Deployment guide: https://github.com/shaik-shahansha/manthan/blob/main/docs/HUGGINGFACE_DEPLOY.md - Author: https://shahansha.com - Org: https://genesisagi.in ## Citation ```bibtex @misc{shaik2026manthan, title={Genesis Manthan-1.5B: Tool-Mediated Reasoning for Small Language Models}, author={Shahansha Shaik}, year={2026}, url={https://huggingface.co/Shahansha/Manthan-1.5B} } ``` ---