RLVR (Reinforcement Learning with Verifiable Rewards)
Max Sequence Length
32,768 tokens
License
CC-BY-NC-4.0
About GooseReason-4B
Nemotron-Research-GooseReason-4B-Instruct is NVIDIA's reasoning model built on Qwen3-4B-Instruct-2507 using RLVR. It achieves strong performance on math, code, and STEM reasoning benchmarks while remaining compact at 4B parameters.
Key Capabilities
Math Reasoning: Strong performance on AIME 2025 and AMC benchmarks
Code Generation: Competitive on LiveCodeBench and HumanEval
STEM: Broad science and technical reasoning capabilities
frommlx_lmimportload,generatemodel,tokenizer=load("DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-16bit")messages=[{"role":"user","content":"Solve: What is the sum of all prime numbers less than 20?"}]prompt=tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True)response=generate(model,tokenizer,prompt=prompt,max_tokens=2048)print(response)
Enabling Extended Thinking
For complex reasoning tasks, the model uses <think> tags automatically. You can also prompt it explicitly:
messages=[{"role":"system","content":"Think step by step before answering."},{"role":"user","content":"Find all positive integers n such that n^2 + 2n + 2 is divisible by 7."}]