--- license: mit base_model: meta/llama-3.2-3b-instruct tags: - unsloth - llama-3.2 - mathematics - reasoning - arithmetic - fine-tuned - rimon-dutta - logic - chain-of-thought - open-r1 - conversational - text-generation-inference language: - en pipeline_tag: text-generation library_name: transformers datasets: - open-r1/OpenR1-Math-220k model_creator: Rimon Dutta model_name: Rimon-Math-3B-V1 --- # Rimon-Math-3B-V1 **Rimon-Math-3B-V1** is a specialized 3-billion-parameter causal language model, fine-tuned for high-accuracy mathematical reasoning and logical problem-solving. Built on the **Llama-3.2-3B-Instruct** architecture and optimized using the **Unsloth** framework, this model excels at generating structured, step-by-step solutions (Chain-of-Thought). ## Highlights - **Reasoning Focused:** Trained specifically to break down complex problems into logical steps. - **Lightweight & Efficient:** Optimized for consumer-grade GPUs (T4, RTX 3060+) and edge deployment. - **High Compatibility:** Works seamlessly with `transformers`, `vLLM`, and supports `GGUF` conversion for local use. --- ## Model Capabilities The model is fine-tuned to handle various mathematical domains: - **Algebra:** Solving equations, inequalities, and system of equations. - **Calculus:** Derivatives, integrals, and limit problems. - **Geometry & Trigonometry:** Properties of shapes and trigonometric identities. - **Logic & Arithmetic:** Multi-step word problems and sequence analysis. --- ### Training Metrics (Approximation) | Epoch | Step | Training Loss | Validation Loss | LR | |------|------|--------------|----------------|--------------| | 1.0 | 1000 | 0.7104 | 0.6952 | 1.5e-4 | | 2.0 | 2000 | 0.5911 | 0.5843 | 5.0e-5 | | 3.0 | 3000 | 0.5244 | 0.5102 | 1.0e-5 | --- ## Usage Guide ## Installation & Dependencies To run Rimon-Math-3B-V1 efficiently, ensure you have the latest versions of the following libraries installed. Run this command in your terminal or a notebook cell: ```bash pip install -U transformers torch accelerate bitsandbytes sentencepiece ``` | Component | Minimum (4-bit) | Recommended (16-bit) | |----------|----------------|---------------------| | GPU | NVIDIA T4 / RTX 3050 (4GB VRAM) | RTX 3060 / A100 (12GB+ VRAM) | | RAM | 8 GB System RAM | 16 GB System RAM | | CUDA | 11.8 or higher | 12.1 or higher | ## How to Use the Model You can load the model in two different modes depending on your hardware resources. # Option 1: 4-bit Quantization (Low VRAM Mode) Best for users on Google Colab (Free T4) or laptops with limited GPU memory. This uses only ~3.5 GB of VRAM. ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig import torch model_id = "rimon-dutta/Rimon-Math-3B-V1" # 4-bit Configuration for memory efficiency bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True ) tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, quantization_config=bnb_config, device_map="auto", trust_remote_code=True ) ``` # Option 2: 16-bit Full Precision (High Accuracy Mode) Best for users with 8GB+ VRAM (e.g., RTX 3060 12GB or higher). This provides the most precise mathematical reasoning. ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "rimon-dutta/Rimon-Math-3B-V1" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" ) ``` # Running Inference (Example) Once the model is loaded, you can solve math problems using the standard Llama 3.2 chat template. ```python # Define your math problem messages = [ {"role": "system", "content": "You are a specialized math tutor. Explain step-by-step."}, {"role": "user", "content": "If x + 1/x = 3, find the value of x^5 + 1/x^5."} ] # Apply the chat template inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) # Generate the response outputs = model.generate( **inputs, max_new_tokens=1024, temperature=0.1, # Low temperature is crucial for math accuracy do_sample=True, pad_token_id=tokenizer.eos_token_id ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` # Troubleshooting Guide 1. GPU Memory Error (OOM): If you get an "Out of Memory" error, restart your runtime and use Option 1 (4-bit). 3. BitsAndBytes Issues: If load_in_4bit fails, ensure you are running on a Linux-based environment (or WSL2 on Windows) and that your bitsandbytes is up to date: ```bash pip install -U bitsandbytes ``` 3. CUDA Mismatch: If you encounter a runtime error regarding CUDA versions, reinstall PyTorch with the correct index URL: ```bash pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 ``` # Prompt Engineering Tips Use a system prompt to control reasoning style Keep temperature between 0.1 – 0.3 for math tasks Always request step-by-step explanation Avoid ambiguous wording in problems ## Author Rimon Dutta DevOps Engineer | AI & ML Learner Kotwali, Bangladesh