Files
Rimon-Math-3B-V1/README.md
ModelHub XC 12b9c293d2 初始化项目,由ModelHub XC社区提供模型
Model: rimon-dutta/Rimon-Math-3B-V1
Source: Original Platform
2026-04-11 05:22:56 +08:00

165 lines
5.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: mit
base_model: meta/llama-3.2-3b-instruct
tags:
- unsloth
- llama-3.2
- mathematics
- reasoning
- arithmetic
- fine-tuned
- rimon-dutta
- logic
- chain-of-thought
- open-r1
- conversational
- text-generation-inference
language:
- en
pipeline_tag: text-generation
library_name: transformers
datasets:
- open-r1/OpenR1-Math-220k
model_creator: Rimon Dutta
model_name: Rimon-Math-3B-V1
---
# Rimon-Math-3B-V1
**Rimon-Math-3B-V1** is a specialized 3-billion-parameter causal language model, fine-tuned for high-accuracy mathematical reasoning and logical problem-solving. Built on the **Llama-3.2-3B-Instruct** architecture and optimized using the **Unsloth** framework, this model excels at generating structured, step-by-step solutions (Chain-of-Thought).
## Highlights
- **Reasoning Focused:** Trained specifically to break down complex problems into logical steps.
- **Lightweight & Efficient:** Optimized for consumer-grade GPUs (T4, RTX 3060+) and edge deployment.
- **High Compatibility:** Works seamlessly with `transformers`, `vLLM`, and supports `GGUF` conversion for local use.
---
## Model Capabilities
The model is fine-tuned to handle various mathematical domains:
- **Algebra:** Solving equations, inequalities, and system of equations.
- **Calculus:** Derivatives, integrals, and limit problems.
- **Geometry & Trigonometry:** Properties of shapes and trigonometric identities.
- **Logic & Arithmetic:** Multi-step word problems and sequence analysis.
---
### Training Metrics (Approximation)
| Epoch | Step | Training Loss | Validation Loss | LR |
|------|------|--------------|----------------|--------------|
| 1.0 | 1000 | 0.7104 | 0.6952 | 1.5e-4 |
| 2.0 | 2000 | 0.5911 | 0.5843 | 5.0e-5 |
| 3.0 | 3000 | 0.5244 | 0.5102 | 1.0e-5 |
---
## Usage Guide
## Installation & Dependencies
To run Rimon-Math-3B-V1 efficiently, ensure you have the latest versions of the following libraries installed. Run this command in your terminal or a notebook cell:
```bash
pip install -U transformers torch accelerate bitsandbytes sentencepiece
```
| Component | Minimum (4-bit) | Recommended (16-bit) |
|----------|----------------|---------------------|
| GPU | NVIDIA T4 / RTX 3050 (4GB VRAM) | RTX 3060 / A100 (12GB+ VRAM) |
| RAM | 8 GB System RAM | 16 GB System RAM |
| CUDA | 11.8 or higher | 12.1 or higher |
## How to Use the Model
You can load the model in two different modes depending on your hardware resources.
# Option 1: 4-bit Quantization (Low VRAM Mode)
Best for users on Google Colab (Free T4) or laptops with limited GPU memory. This uses only ~3.5 GB of VRAM.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
model_id = "rimon-dutta/Rimon-Math-3B-V1"
# 4-bit Configuration for memory efficiency
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
```
# Option 2: 16-bit Full Precision (High Accuracy Mode)
Best for users with 8GB+ VRAM (e.g., RTX 3060 12GB or higher). This provides the most precise mathematical reasoning.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "rimon-dutta/Rimon-Math-3B-V1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
```
# Running Inference (Example)
Once the model is loaded, you can solve math problems using the standard Llama 3.2 chat template.
```python
# Define your math problem
messages = [
{"role": "system", "content": "You are a specialized math tutor. Explain step-by-step."},
{"role": "user", "content": "If x + 1/x = 3, find the value of x^5 + 1/x^5."}
]
# Apply the chat template
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
# Generate the response
outputs = model.generate(
**inputs,
max_new_tokens=1024,
temperature=0.1, # Low temperature is crucial for math accuracy
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
# Troubleshooting Guide
1. GPU Memory Error (OOM):
If you get an "Out of Memory" error, restart your runtime and use Option 1 (4-bit).
3. BitsAndBytes Issues:
If load_in_4bit fails, ensure you are running on a Linux-based environment (or WSL2 on Windows) and that your bitsandbytes is up to date:
```bash
pip install -U bitsandbytes
```
3. CUDA Mismatch:
If you encounter a runtime error regarding CUDA versions, reinstall PyTorch with the correct index URL:
```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```
# Prompt Engineering Tips
Use a system prompt to control reasoning style Keep temperature between 0.1 0.3 for math tasks Always request step-by-step explanation Avoid ambiguous wording in problems
## Author
<span style="color:#90ee90">
Rimon Dutta
DevOps Engineer | AI & ML Learner
Kotwali, Bangladesh
</span>