--- license: apache-2.0 base_model: allenai/OLMo-3-7B-RLZero-Math language: - en library_name: transformers tags: - gguf - mlx - ollama - math - reasoning - olmo - quantized pipeline_tag: text-generation --- # OLMo-3-7B-RLZero-Math GGUF > **GGUF & MLX quantizations** of Allen Institute for AI's mathematical reasoning model, optimized for local inference with llama.cpp, Ollama, and Apple Silicon. ## Highlights | | | |---|---| | **Math Specialist** | Fine-tuned with RL-Zero for step-by-step mathematical reasoning | | **65K Context** | 65,536 token context window with YaRN scaling | | **Apple Silicon Ready** | MLX-optimized 4-bit quantization included | | **Runs Anywhere** | From 4GB RAM to full precision | ## Model Specifications | Property | Value | |----------|-------| | **Parameters** | 7 billion | | **Architecture** | OLMo2 | | **Context Length** | 65,536 tokens | | **Training** | RL-Zero mathematical reasoning | | **License** | Apache 2.0 | ## Available Versions ### GGUF Quantizations | Quantization | Size | Quality | Use Case | |--------------|------|---------|----------| | `F16` | 14 GB | Near-perfect | Maximum quality, research | | `Q8_0` | 7.2 GB | Excellent | Near-lossless, high-end hardware | | `Q5_K_M` | 4.9 GB | Very Good | Excellent quality/size balance | | `Q4_K_M` | 4.2 GB | Good | **Recommended** for most users | | `IQ4_XS` | 3.8 GB | Good | Compact 4-bit | | `IQ3_M` | 3.2 GB | Acceptable | Ultra-compact, constrained devices | ### MLX (Apple Silicon) 4-bit quantized version in `MLX-4bit/` folder - optimized for M1/M2/M3/M4 Macs. ## Quick Start ### Ollama (Easiest) ```bash ollama run richardyoung/olmo-3-7b-rlzero-math ``` ### llama.cpp ```bash # Download Q4_K_M (recommended) wget https://huggingface.co/richardyoung/OLMo-3-7B-RLZero-Math-GGUF/resolve/main/Olmo-3-7B-RLZero-Math-Q4_K_M.gguf # Run inference ./llama-cli -m Olmo-3-7B-RLZero-Math-Q4_K_M.gguf \ -p "Solve step by step: What is 15% of 240?" \ -n 512 ``` ### MLX (Apple Silicon) ```bash pip install mlx-lm mlx_lm.generate \ --model richardyoung/OLMo-3-7B-RLZero-Math-GGUF \ --prompt "Solve: Find the derivative of x^3 + 2x" \ --trust-remote-code ``` ### Python ```python from llama_cpp import Llama llm = Llama( model_path="Olmo-3-7B-RLZero-Math-Q4_K_M.gguf", n_ctx=4096 ) output = llm( "Solve step by step: What is the sum of the first 10 prime numbers?", max_tokens=512 ) print(output["choices"][0]["text"]) ``` ## System Requirements | Quantization | Min RAM | Recommended | Apple Silicon | |--------------|---------|-------------|---------------| | IQ3_M | 4 GB | 8 GB | M1 8GB | | IQ4_XS / Q4_K_M | 6 GB | 12 GB | M1 8GB | | Q5_K_M / Q8_0 | 8 GB | 16 GB | M1 16GB | | F16 | 16 GB | 24 GB | M2 Pro+ | ## Prompt Format ``` Solve the following math problem step by step: {your problem here} ``` **Example:** ``` Solve the following math problem step by step: A train travels 120 miles in 2 hours. If it continues at the same speed, how long will it take to travel 300 miles? ``` ## Links | Resource | Link | |----------|------| | **Original Model** | [allenai/OLMo-3-7B-RLZero-Math](https://huggingface.co/allenai/OLMo-3-7B-RLZero-Math) | | **Ollama** | [richardyoung/olmo-3-7b-rlzero-math](https://ollama.com/richardyoung/olmo-3-7b-rlzero-math) | | **Allen AI** | [allenai.org](https://allenai.org/) | --- **Quantization by** [Richard Young](https://huggingface.co/richardyoung) | **Original model by** [Allen Institute for AI](https://allenai.org/)