Nemotron-Research-GooseReas…/README.md

---
license: cc-by-nc-4.0
language:
- en
base_model: nvidia/Nemotron-Research-GooseReason-4B-Instruct
pipeline_tag: text-generation
library_name: mlx
tags:
- mlx
- qwen3
- reasoning
- rlvr
- math
- code
- stem
- nvidia
---

# GooseReason-4B-Instruct — MLX 16-bit (Full Precision)

This is the **full-precision MLX** version of [nvidia/Nemotron-Research-GooseReason-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Research-GooseReason-4B-Instruct), converted for inference using [MLX](https://github.com/ml-explore/mlx).

## Model Overview

| Attribute | Value |
|---|---|
| **Original Model** | [nvidia/Nemotron-Research-GooseReason-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Research-GooseReason-4B-Instruct) |
| **Architecture** | Qwen3 (4.4B parameters) |
| **Precision** | 16-bit (BFloat16, no quantization) |
| **Base Model** | Qwen3-4B-Instruct-2507 |
| **Training Method** | RLVR (Reinforcement Learning with Verifiable Rewards) |
| **Max Sequence Length** | 32,768 tokens |
| **License** | CC-BY-NC-4.0 |

## About GooseReason-4B

Nemotron-Research-GooseReason-4B-Instruct is NVIDIA's reasoning model built on Qwen3-4B-Instruct-2507 using RLVR. It achieves strong performance on math, code, and STEM reasoning benchmarks while remaining compact at 4B parameters.

### Key Capabilities

- **Math Reasoning**: Strong performance on AIME 2025 and AMC benchmarks
- **Code Generation**: Competitive on LiveCodeBench and HumanEval
- **STEM**: Broad science and technical reasoning capabilities
- **Thinking Mode**: Uses extended thinking (`<think>` tags) for complex reasoning tasks

### Benchmark Highlights

| Benchmark | GooseReason-4B |
|---|---|
| AIME 2025 (avg@64) | 55.0 |
| AMC (avg@64) | 82.2 |
| LiveCodeBench v6 (pass@1) | 30.1 |
| GPQA Diamond (avg@8) | 47.5 |

## Usage with MLX

```bash
pip install mlx-lm
```

```python
from mlx_lm import load, generate

model, tokenizer = load("DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-16bit")

messages = [
    {"role": "user", "content": "Solve: What is the sum of all prime numbers less than 20?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=2048)
print(response)
```

### Enabling Extended Thinking

For complex reasoning tasks, the model uses `<think>` tags automatically. You can also prompt it explicitly:

```python
messages = [
    {
        "role": "system",
        "content": "Think step by step before answering."
    },
    {
        "role": "user",
        "content": "Find all positive integers n such that n^2 + 2n + 2 is divisible by 7."
    }
]
```

## All Available Formats

| Variant | Link | Size |
|---|---|---|
| MLX 16-bit | **This repo** | ~8.8 GB |
| MLX 8-bit | [DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-8bit](https://huggingface.co/DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-8bit) | ~4.6 GB |
| MLX 6-bit | [DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-6bit](https://huggingface.co/DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-6bit) | ~3.5 GB |
| MLX 4-bit | [DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-4bit](https://huggingface.co/DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-4bit) | ~2.5 GB |
| Full Weights | [nvidia/Nemotron-Research-GooseReason-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Research-GooseReason-4B-Instruct) | ~8.8 GB |

## Acknowledgments

- [NVIDIA](https://huggingface.co/nvidia) for the GooseReason-4B model and RLVR research
- [Qwen Team](https://huggingface.co/Qwen) for the Qwen3-4B-Instruct-2507 base model
- [Apple MLX Team](https://github.com/ml-explore/mlx) for the MLX framework
初始化项目，由ModelHub XC社区提供模型 Model: DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-16bit Source: Original Platform 2026-04-25 02:19:05 +08:00			`---`
			`license: cc-by-nc-4.0`
			`language:`
			`- en`
			`base_model: nvidia/Nemotron-Research-GooseReason-4B-Instruct`
			`pipeline_tag: text-generation`
			`library_name: mlx`
			`tags:`
			`- mlx`
			`- qwen3`
			`- reasoning`
			`- rlvr`
			`- math`
			`- code`
			`- stem`
			`- nvidia`
			`---`

			`# GooseReason-4B-Instruct — MLX 16-bit (Full Precision)`

			`This is the full-precision MLX version of [nvidia/Nemotron-Research-GooseReason-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Research-GooseReason-4B-Instruct), converted for inference using [MLX](https://github.com/ml-explore/mlx).`

			`## Model Overview`

			`\| Attribute \| Value \|`
			`\|---\|---\|`
			`\| Original Model \| [nvidia/Nemotron-Research-GooseReason-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Research-GooseReason-4B-Instruct) \|`
			`\| Architecture \| Qwen3 (4.4B parameters) \|`
			`\| Precision \| 16-bit (BFloat16, no quantization) \|`
			`\| Base Model \| Qwen3-4B-Instruct-2507 \|`
			`\| Training Method \| RLVR (Reinforcement Learning with Verifiable Rewards) \|`
			`\| Max Sequence Length \| 32,768 tokens \|`
			`\| License \| CC-BY-NC-4.0 \|`

			`## About GooseReason-4B`

			`Nemotron-Research-GooseReason-4B-Instruct is NVIDIA's reasoning model built on Qwen3-4B-Instruct-2507 using RLVR. It achieves strong performance on math, code, and STEM reasoning benchmarks while remaining compact at 4B parameters.`

			`### Key Capabilities`

			`- Math Reasoning: Strong performance on AIME 2025 and AMC benchmarks`
			`- Code Generation: Competitive on LiveCodeBench and HumanEval`
			`- STEM: Broad science and technical reasoning capabilities`
			- Thinking Mode: Uses extended thinking (`<think>` tags) for complex reasoning tasks

			`### Benchmark Highlights`

			`\| Benchmark \| GooseReason-4B \|`
			`\|---\|---\|`
			`\| AIME 2025 (avg@64) \| 55.0 \|`
			`\| AMC (avg@64) \| 82.2 \|`
			`\| LiveCodeBench v6 (pass@1) \| 30.1 \|`
			`\| GPQA Diamond (avg@8) \| 47.5 \|`

			`## Usage with MLX`

			```bash
			`pip install mlx-lm`
			```

			```python
			`from mlx_lm import load, generate`

			`model, tokenizer = load("DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-16bit")`

			`messages = [`
			`{"role": "user", "content": "Solve: What is the sum of all prime numbers less than 20?"}`
			`]`

			`prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)`
			`response = generate(model, tokenizer, prompt=prompt, max_tokens=2048)`
			`print(response)`
			```

			`### Enabling Extended Thinking`

			For complex reasoning tasks, the model uses `<think>` tags automatically. You can also prompt it explicitly:

			```python
			`messages = [`
			`{`
			`"role": "system",`
			`"content": "Think step by step before answering."`
			`},`
			`{`
			`"role": "user",`
			`"content": "Find all positive integers n such that n^2 + 2n + 2 is divisible by 7."`
			`}`
			`]`
			```

			`## All Available Formats`

			`\| Variant \| Link \| Size \|`
			`\|---\|---\|---\|`
			`\| MLX 16-bit \| This repo \| ~8.8 GB \|`
			`\| MLX 8-bit \| [DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-8bit](https://huggingface.co/DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-8bit) \| ~4.6 GB \|`
			`\| MLX 6-bit \| [DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-6bit](https://huggingface.co/DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-6bit) \| ~3.5 GB \|`
			`\| MLX 4-bit \| [DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-4bit](https://huggingface.co/DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-4bit) \| ~2.5 GB \|`
			`\| Full Weights \| [nvidia/Nemotron-Research-GooseReason-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Research-GooseReason-4B-Instruct) \| ~8.8 GB \|`

			`## Acknowledgments`

			`- [NVIDIA](https://huggingface.co/nvidia) for the GooseReason-4B model and RLVR research`
			`- [Qwen Team](https://huggingface.co/Qwen) for the Qwen3-4B-Instruct-2507 base model`
			`- [Apple MLX Team](https://github.com/ml-explore/mlx) for the MLX framework`