--- license: cc-by-nc-4.0 language: - en base_model: nvidia/Nemotron-Research-GooseReason-4B-Instruct pipeline_tag: text-generation library_name: mlx tags: - mlx - qwen3 - reasoning - rlvr - math - code - stem - nvidia --- # GooseReason-4B-Instruct — MLX 16-bit (Full Precision) This is the **full-precision MLX** version of [nvidia/Nemotron-Research-GooseReason-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Research-GooseReason-4B-Instruct), converted for inference using [MLX](https://github.com/ml-explore/mlx). ## Model Overview | Attribute | Value | |---|---| | **Original Model** | [nvidia/Nemotron-Research-GooseReason-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Research-GooseReason-4B-Instruct) | | **Architecture** | Qwen3 (4.4B parameters) | | **Precision** | 16-bit (BFloat16, no quantization) | | **Base Model** | Qwen3-4B-Instruct-2507 | | **Training Method** | RLVR (Reinforcement Learning with Verifiable Rewards) | | **Max Sequence Length** | 32,768 tokens | | **License** | CC-BY-NC-4.0 | ## About GooseReason-4B Nemotron-Research-GooseReason-4B-Instruct is NVIDIA's reasoning model built on Qwen3-4B-Instruct-2507 using RLVR. It achieves strong performance on math, code, and STEM reasoning benchmarks while remaining compact at 4B parameters. ### Key Capabilities - **Math Reasoning**: Strong performance on AIME 2025 and AMC benchmarks - **Code Generation**: Competitive on LiveCodeBench and HumanEval - **STEM**: Broad science and technical reasoning capabilities - **Thinking Mode**: Uses extended thinking (`` tags) for complex reasoning tasks ### Benchmark Highlights | Benchmark | GooseReason-4B | |---|---| | AIME 2025 (avg@64) | 55.0 | | AMC (avg@64) | 82.2 | | LiveCodeBench v6 (pass@1) | 30.1 | | GPQA Diamond (avg@8) | 47.5 | ## Usage with MLX ```bash pip install mlx-lm ``` ```python from mlx_lm import load, generate model, tokenizer = load("DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-16bit") messages = [ {"role": "user", "content": "Solve: What is the sum of all prime numbers less than 20?"} ] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) response = generate(model, tokenizer, prompt=prompt, max_tokens=2048) print(response) ``` ### Enabling Extended Thinking For complex reasoning tasks, the model uses `` tags automatically. You can also prompt it explicitly: ```python messages = [ { "role": "system", "content": "Think step by step before answering." }, { "role": "user", "content": "Find all positive integers n such that n^2 + 2n + 2 is divisible by 7." } ] ``` ## All Available Formats | Variant | Link | Size | |---|---|---| | MLX 16-bit | **This repo** | ~8.8 GB | | MLX 8-bit | [DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-8bit](https://huggingface.co/DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-8bit) | ~4.6 GB | | MLX 6-bit | [DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-6bit](https://huggingface.co/DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-6bit) | ~3.5 GB | | MLX 4-bit | [DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-4bit](https://huggingface.co/DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-4bit) | ~2.5 GB | | Full Weights | [nvidia/Nemotron-Research-GooseReason-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Research-GooseReason-4B-Instruct) | ~8.8 GB | ## Acknowledgments - [NVIDIA](https://huggingface.co/nvidia) for the GooseReason-4B model and RLVR research - [Qwen Team](https://huggingface.co/Qwen) for the Qwen3-4B-Instruct-2507 base model - [Apple MLX Team](https://github.com/ml-explore/mlx) for the MLX framework