107 lines
3.6 KiB
Markdown
107 lines
3.6 KiB
Markdown
|
|
---
|
||
|
|
license: cc-by-nc-4.0
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
base_model: nvidia/Nemotron-Research-GooseReason-4B-Instruct
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
library_name: mlx
|
||
|
|
tags:
|
||
|
|
- mlx
|
||
|
|
- qwen3
|
||
|
|
- reasoning
|
||
|
|
- rlvr
|
||
|
|
- math
|
||
|
|
- code
|
||
|
|
- stem
|
||
|
|
- nvidia
|
||
|
|
---
|
||
|
|
|
||
|
|
# GooseReason-4B-Instruct — MLX 16-bit (Full Precision)
|
||
|
|
|
||
|
|
This is the **full-precision MLX** version of [nvidia/Nemotron-Research-GooseReason-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Research-GooseReason-4B-Instruct), converted for inference using [MLX](https://github.com/ml-explore/mlx).
|
||
|
|
|
||
|
|
## Model Overview
|
||
|
|
|
||
|
|
| Attribute | Value |
|
||
|
|
|---|---|
|
||
|
|
| **Original Model** | [nvidia/Nemotron-Research-GooseReason-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Research-GooseReason-4B-Instruct) |
|
||
|
|
| **Architecture** | Qwen3 (4.4B parameters) |
|
||
|
|
| **Precision** | 16-bit (BFloat16, no quantization) |
|
||
|
|
| **Base Model** | Qwen3-4B-Instruct-2507 |
|
||
|
|
| **Training Method** | RLVR (Reinforcement Learning with Verifiable Rewards) |
|
||
|
|
| **Max Sequence Length** | 32,768 tokens |
|
||
|
|
| **License** | CC-BY-NC-4.0 |
|
||
|
|
|
||
|
|
## About GooseReason-4B
|
||
|
|
|
||
|
|
Nemotron-Research-GooseReason-4B-Instruct is NVIDIA's reasoning model built on Qwen3-4B-Instruct-2507 using RLVR. It achieves strong performance on math, code, and STEM reasoning benchmarks while remaining compact at 4B parameters.
|
||
|
|
|
||
|
|
### Key Capabilities
|
||
|
|
|
||
|
|
- **Math Reasoning**: Strong performance on AIME 2025 and AMC benchmarks
|
||
|
|
- **Code Generation**: Competitive on LiveCodeBench and HumanEval
|
||
|
|
- **STEM**: Broad science and technical reasoning capabilities
|
||
|
|
- **Thinking Mode**: Uses extended thinking (`<think>` tags) for complex reasoning tasks
|
||
|
|
|
||
|
|
### Benchmark Highlights
|
||
|
|
|
||
|
|
| Benchmark | GooseReason-4B |
|
||
|
|
|---|---|
|
||
|
|
| AIME 2025 (avg@64) | 55.0 |
|
||
|
|
| AMC (avg@64) | 82.2 |
|
||
|
|
| LiveCodeBench v6 (pass@1) | 30.1 |
|
||
|
|
| GPQA Diamond (avg@8) | 47.5 |
|
||
|
|
|
||
|
|
## Usage with MLX
|
||
|
|
|
||
|
|
```bash
|
||
|
|
pip install mlx-lm
|
||
|
|
```
|
||
|
|
|
||
|
|
```python
|
||
|
|
from mlx_lm import load, generate
|
||
|
|
|
||
|
|
model, tokenizer = load("DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-16bit")
|
||
|
|
|
||
|
|
messages = [
|
||
|
|
{"role": "user", "content": "Solve: What is the sum of all prime numbers less than 20?"}
|
||
|
|
]
|
||
|
|
|
||
|
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||
|
|
response = generate(model, tokenizer, prompt=prompt, max_tokens=2048)
|
||
|
|
print(response)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Enabling Extended Thinking
|
||
|
|
|
||
|
|
For complex reasoning tasks, the model uses `<think>` tags automatically. You can also prompt it explicitly:
|
||
|
|
|
||
|
|
```python
|
||
|
|
messages = [
|
||
|
|
{
|
||
|
|
"role": "system",
|
||
|
|
"content": "Think step by step before answering."
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"role": "user",
|
||
|
|
"content": "Find all positive integers n such that n^2 + 2n + 2 is divisible by 7."
|
||
|
|
}
|
||
|
|
]
|
||
|
|
```
|
||
|
|
|
||
|
|
## All Available Formats
|
||
|
|
|
||
|
|
| Variant | Link | Size |
|
||
|
|
|---|---|---|
|
||
|
|
| MLX 16-bit | **This repo** | ~8.8 GB |
|
||
|
|
| MLX 8-bit | [DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-8bit](https://huggingface.co/DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-8bit) | ~4.6 GB |
|
||
|
|
| MLX 6-bit | [DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-6bit](https://huggingface.co/DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-6bit) | ~3.5 GB |
|
||
|
|
| MLX 4-bit | [DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-4bit](https://huggingface.co/DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-4bit) | ~2.5 GB |
|
||
|
|
| Full Weights | [nvidia/Nemotron-Research-GooseReason-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Research-GooseReason-4B-Instruct) | ~8.8 GB |
|
||
|
|
|
||
|
|
## Acknowledgments
|
||
|
|
|
||
|
|
- [NVIDIA](https://huggingface.co/nvidia) for the GooseReason-4B model and RLVR research
|
||
|
|
- [Qwen Team](https://huggingface.co/Qwen) for the Qwen3-4B-Instruct-2507 base model
|
||
|
|
- [Apple MLX Team](https://github.com/ml-explore/mlx) for the MLX framework
|