ModelHub XC ced89bb2e1 初始化项目,由ModelHub XC社区提供模型
Model: DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-16bit
Source: Original Platform
2026-04-25 02:19:05 +08:00

license, language, base_model, pipeline_tag, library_name, tags
license language base_model pipeline_tag library_name tags
cc-by-nc-4.0
en
nvidia/Nemotron-Research-GooseReason-4B-Instruct text-generation mlx
mlx
qwen3
reasoning
rlvr
math
code
stem
nvidia

GooseReason-4B-Instruct — MLX 16-bit (Full Precision)

This is the full-precision MLX version of nvidia/Nemotron-Research-GooseReason-4B-Instruct, converted for inference using MLX.

Model Overview

Attribute Value
Original Model nvidia/Nemotron-Research-GooseReason-4B-Instruct
Architecture Qwen3 (4.4B parameters)
Precision 16-bit (BFloat16, no quantization)
Base Model Qwen3-4B-Instruct-2507
Training Method RLVR (Reinforcement Learning with Verifiable Rewards)
Max Sequence Length 32,768 tokens
License CC-BY-NC-4.0

About GooseReason-4B

Nemotron-Research-GooseReason-4B-Instruct is NVIDIA's reasoning model built on Qwen3-4B-Instruct-2507 using RLVR. It achieves strong performance on math, code, and STEM reasoning benchmarks while remaining compact at 4B parameters.

Key Capabilities

  • Math Reasoning: Strong performance on AIME 2025 and AMC benchmarks
  • Code Generation: Competitive on LiveCodeBench and HumanEval
  • STEM: Broad science and technical reasoning capabilities
  • Thinking Mode: Uses extended thinking (<think> tags) for complex reasoning tasks

Benchmark Highlights

Benchmark GooseReason-4B
AIME 2025 (avg@64) 55.0
AMC (avg@64) 82.2
LiveCodeBench v6 (pass@1) 30.1
GPQA Diamond (avg@8) 47.5

Usage with MLX

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-16bit")

messages = [
    {"role": "user", "content": "Solve: What is the sum of all prime numbers less than 20?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=2048)
print(response)

Enabling Extended Thinking

For complex reasoning tasks, the model uses <think> tags automatically. You can also prompt it explicitly:

messages = [
    {
        "role": "system",
        "content": "Think step by step before answering."
    },
    {
        "role": "user",
        "content": "Find all positive integers n such that n^2 + 2n + 2 is divisible by 7."
    }
]

All Available Formats

Variant Link Size
MLX 16-bit This repo ~8.8 GB
MLX 8-bit DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-8bit ~4.6 GB
MLX 6-bit DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-6bit ~3.5 GB
MLX 4-bit DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-4bit ~2.5 GB
Full Weights nvidia/Nemotron-Research-GooseReason-4B-Instruct ~8.8 GB

Acknowledgments

Description
Model synced from source: DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-16bit
Readme 2 MiB
Languages
Jinja 100%