ModelHub XC 73959dd3af 初始化项目,由ModelHub XC社区提供模型
Model: daydreamwarrior/Nemotron-Research-GooseReason-4B-Instruct-heretic-v2
Source: Original Platform
2026-05-17 23:58:16 +08:00

license, language, base_model, pipeline_tag, library_name, tags
license language base_model pipeline_tag library_name tags
cc-by-nc-4.0
en
nvidia/Nemotron-Research-GooseReason-4B-Instruct
text-generation transformers
reasoning
rlvr
math
code
stem
nvidia
heretic
uncensored
decensored
abliterated

This is a decensored version of nvidia/Nemotron-Research-GooseReason-4B-Instruct, made using Heretic v1.2.0

Abliteration parameters

Parameter Value
direction_index 19.04
attn.o_proj.max_weight 1.25
attn.o_proj.max_weight_position 23.36
attn.o_proj.min_weight 1.15
attn.o_proj.min_weight_distance 18.21
mlp.down_proj.max_weight 1.05
mlp.down_proj.max_weight_position 27.34
mlp.down_proj.min_weight 1.01
mlp.down_proj.min_weight_distance 15.35

Performance

Metric This model Original model (nvidia/Nemotron-Research-GooseReason-4B-Instruct)
KL divergence 0.0426 0 (by definition)
Refusals 5/100 99/100

GooseReason-4B-Instruct

Trained with Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Paper License: CC BY-NC 4.0

GooseReason4BInstruct is a state-of-the-art 4B reasoning model trained via Reinforcement Learning with Verifiable Rewards (RLVR) on GooseReason-0.7M, a large-scale dataset synthesized by the Golden Goose pipeline. Starting from Qwen34BInstruct and applying the ProRLv2 RL recipe augmented with GooseReason-0.7M data, GooseReason-4B-Instruct achieves new state-of-the-art results among 4B-Instruct models across 15 diverse benchmarks, spanning mathematics, programming, STEM reasoning, instruction following, and logical puzzles.

This model is for research and development only.

Golden Goose

Scaling up RLVR is bottlenecked by the scarcity of verifiable training data, where improvements increasingly saturate after prolonged training on existing datasets. Golden Goose is a simple, scalable pipeline that synthesizes unlimited RLVR tasks from reasoning-rich but unverifiable internet text—corpora such as science textbooks, Olympiad math forums, and cybersecurity web scrapes that were previously excluded from RLVR data construction due to the difficulty of automatic verification.

The key idea: given a source text S, we prompt an LLM to identify a contiguous span t of crucial reasoning steps and replace it with a [MASK] token, constructing a masked context S_mask. Treating t as the ground-truth answer, the LLM then generates a set of diverse, plausible distractors D = {d₁, ..., dₖ} that are similar in style and length to the removed span yet incorrect in context, forming a multiple-choice question: Q = (S_mask, {t} D)

Verification during RL simply checks whether the model's prediction matches the ground-truth option—no external judge or test execution needed. This formulation unlocks reasoning-rich corpora that were previously unusable for RLVR: Olympiad-level theorem proving from AoPS-Instruct, free-form textbook QA from MegaScience, and coding problems without test cases from rStar-Coder.

GooseReason-0.7M Dataset

Using the Golden Goose pipeline, we synthesize GooseReason-0.7M, a large-scale RLVR dataset with over 0.7 million tasks spanning mathematics, programming, and general scientific domains. The dataset is constructed from the following source corpora:

Domain # Examples Source Description
Math 235,836 AoPSInstruct ~600K QA pairs from the Art of Problem Solving forum, predominantly featuring Olympiad-level math problems with community-driven solutions
Code 281,793 rStarCoder ~418K coding problems from competitive programming platforms; we use the synthetic_sft split (questions + teacher model solutions without test cases), which is not directly usable for RL training
STEM 155,496 MegaScience ~650K QA pairs from ~12K university-level scientific textbooks spanning physics, biology, chemistry, medicine, computer science, mathematics, and economics

The data mixing ratio used to train GooseReason-4B-Instruct is 55% ProRL data, 15% GooseReason-0.7M Math, 15% GooseReason-0.7M Code, and 15% GooseReason-0.7M STEM.

Evaluation Results

GooseReason-4B-Instruct is evaluated on 15 diverse benchmarks following the ProRL evaluation protocol. Math performance is measured on AIME 2024/2025, AMC, MATH, Minerva, and Olympiad Bench. Code performance is measured on APPS, CodeContests, CodeForces, TACO, HumanEvalPlus, and LiveCodeBench. STEM and reasoning tasks are measured via GPQA Diamond, IFEval, and Reasoning Gym (logical puzzles). The Qwen330BInstruct results (in italics) are provided as a reference.

Table 1. Performance (pass@1) comparison across math benchmarks. Adding GooseReason-0.7M revives the saturated model and enables further RL scaling, achieving a +2.18% absolute gain (vs. a 0.79% degradation when continuing on ProRL data alone).

Model RL Data RL Steps AIME24 AIME25 AMC MATH Minerva Olympiad Avg
Qwen34BInstruct 64.79 48.75 85.17 94.66 50.09 65.83 68.21
Qwen34BInstruct ProRL Dataset 333 66.46 57.29 87.80 96.41 53.72 68.24 71.65
Qwen34BInstruct ProRL Dataset +156 62.29 55.21 87.65 96.54 53.33 67.19 70.36
GooseReason4BInstruct +GooseReason0.7M +270 70.00 63.96 89.16 96.70 54.37 68.79 73.83
Qwen330BInstruct 76.66 63.74 91.64 97.10 51.99 70.05 75.20

Table 2. Performance (pass@1) comparison across coding benchmarks. GooseReason-4B-Instruct achieves a +2.24% absolute gain in coding average, outperforming Qwen330BInstruct by a wide margin.

Model RL Data RL Steps APPS CodeContests CodeForces TACO HumanEvalPlus LiveCodeBench Avg
Qwen34BInstruct 47.01 42.08 33.69 23.69 77.56 31.74 42.63
Qwen34BInstruct ProRL Dataset 333 57.92 52.55 51.67 33.13 84.24 41.28 53.46
Qwen34BInstruct ProRL Dataset +156 58.45 52.88 54.47 32.80 84.20 40.56 53.89
GooseReason4BInstruct +GooseReason0.7M +270 60.48 54.66 55.59 35.37 86.46 41.64 55.70
Qwen330BInstruct 55.37 49.70 47.76 29.05 80.56 43.20 50.94

Table 3. Performance (pass@1) on STEM reasoning (GPQA Diamond), instruction following (IFEval), and logic puzzles (Reasoning Gym). Tasks in Reasoning Gym are grouped into four categories: Math, Algorithmic, Cognition, and Logic.

Model RL Data RL Steps GPQA IFEval Math Algorithmic Cognition Logic Avg. Gym
Qwen34BInstruct 60.26 72.36 43.69 19.46 34.92 57.26 33.98
Qwen34BInstruct ProRL Dataset 333 64.39 79.11 92.66 80.47 60.07 86.90 80.10
Qwen34BInstruct ProRL Dataset +156 62.87 76.24 92.71 83.24 60.75 87.71 81.06
GooseReason4BInstruct +GooseReason0.7M +270 66.79 76.39 92.76 83.91 60.24 87.80 81.28
Qwen330BInstruct 70.40 82.73 53.86 38.51 28.60 32.89 43.56

How to Use

Requirements: transformers >= 4.51.0

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "nvidia/GooseReason-4B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
)

prompt = "Find all positive integers n such that n² + 3n + 5 is a perfect square."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
response = tokenizer.decode(output_ids, skip_special_tokens=True)
print(response)

Recommended sampling parameters: temperature=0.6, max_new_tokens=32768.

Citation

If you find this model or the Golden Goose paper helpful, please cite:

@article{lu2026goldengoose,
  title={Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text},
  author={Lu, Ximing and Acuna, David and Jung, Jaehun and Hu, Jian and Zhang, Di and Diao, Shizhe and Zou, Yunheng and Zhang, Shaokun and Cui, Brandon and Liu, Mingjie and Kim, Hyunwoo and Ammanabrolu, Prithviraj and Kautz, Jan and Dong, Yi and Choi, Yejin},
  journal={arXiv preprint arXiv:2601.22975},
  year={2026}
}
Description
Model synced from source: daydreamwarrior/Nemotron-Research-GooseReason-4B-Instruct-heretic-v2
Readme 2 MiB
Languages
Jinja 100%