初始化项目,由ModelHub XC社区提供模型
Model: nvidia/Nemotron-Cascade-14B-Thinking Source: Original Platform
This commit is contained in:
161
README.md
Normal file
161
README.md
Normal file
@@ -0,0 +1,161 @@
|
||||
---
|
||||
library_name: transformers
|
||||
license: other
|
||||
license_name: nvidia-open-model-license
|
||||
license_link: >-
|
||||
https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
|
||||
pipeline_tag: text-generation
|
||||
language:
|
||||
- en
|
||||
tags:
|
||||
- nvidia
|
||||
- nemotron-cascade
|
||||
- reasoning
|
||||
- general-purpose
|
||||
- SFT
|
||||
- RL
|
||||
- pytorch
|
||||
---
|
||||
|
||||
|
||||
# Nemotron-Cascade-14B-Thinking
|
||||
|
||||
<p align="center">
|
||||
|
||||
[](https://arxiv.org/abs/2512.13607)
|
||||
[](https://huggingface.co/collections/nvidia/nemotron-cascade)
|
||||
[](https://huggingface.co/collections/nvidia/nemotron-cascade)
|
||||
[](https://huggingface.co/collections/nvidia/nemotron-cascade)
|
||||
</p>
|
||||
|
||||
<img src="fig/nemotron-cascade-14b-thinking-results.png" alt="main_fig" style="width: 1000px; max-width: 100%;" />
|
||||
|
||||
|
||||
## Introduction
|
||||
|
||||
We're excited to introduce [Nemotron-Cascade-14B-Thinking](https://huggingface.co/nvidia/Nemotron-Cascade-14B-Thinking), a powerful general-purpose model trained through sequential and domain-wise reinforcement learning. Nemotron-Cascade-14B-Thinking is post-trained from the [Qwen3-14B Base](https://huggingface.co/Qwen/Qwen3-14B-Base) model, and it achieves best-in-class performance across a wide range of benchmarks. Different from [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B), Nemotron-Cascade-14B-Thinking is designed exclusively for the ***thinking*** mode.
|
||||
|
||||
|
||||
## Training Pipeline
|
||||
<img src="fig/pipeline.png" alt="train_pipeline_fig" style="width: 1000px; max-width: 100%;" />
|
||||
|
||||
The training pipeline for Nemotron-Cascade begins with a multi-stage SFT phase to equip the model with foundational skills. Subsequently, Cascade RL is applied across multiple domains to further enhance the model’s performance in these areas.
|
||||
|
||||
Notably, RLHF for alignment, when used as a pre-step, boosts the model’s complex reasoning ability far beyond mere preference optimization, and subsequent domain-wise RLVR stages rarely degrade the benchmark performance attained in earlier domains and may even improve it (see an illustration in the following Figure).
|
||||
|
||||
<figure style="margin: 0; padding: 0;">
|
||||
<img src="fig/lcb_through_cascade_rl.png" alt="lcb_through_cascade_rl_fig" style="width: 100%; max-width: 100%; margin: 0; padding: 0;">
|
||||
<figcaption>The LiveCodeBench v6 (08/24–05/25) performance of the Nemotron-Cascade-14B-Thinking model throughout the Cascade RL process.</figcaption>
|
||||
</figure>
|
||||
|
||||
|
||||
## Results
|
||||
|
||||
- We evaluate our model against competitive reasoning models on a diverse set of benchmarks, covering general-knowledge reasoning, alignment and instruction following, mathematical reasoning, competitive programming, software engineering, and tool-use proficiency.
|
||||
- For Nemotron-Cascade models, we use a maximum generation length of 64K tokens and set the temperature to 0.6 and top-p to 0.95 for reasoning tasks.
|
||||
- Our Nemotron-Cascade-14B-Thinking achieves best-in-class performance across almost all benchmarks. Remarkably, Nemotron-Cascade-14B-Thinking surpasses DeepSeek-R1-0528 (671B) by a clear margin across all LCB v5, v6, and Pro benchmarks.
|
||||
|
||||
| **Benchmark<br>Metric: Pass@1** | **Qwen3-14B** | **DeepSeek-R1-0528 671B** | **Gemini-2.5-Flash-Thinking** | **Nemotron-Cascade-14B-Thinking** |
|
||||
| :---- | :---: | :---: | :---: | :---: |
|
||||
| ***Knowledge Reasoning*** |
|
||||
| MMLU | 84.9 | 89.9 | - | 85.1 |
|
||||
| MMLU Pro | 77.6 | 85.0 | 81.9 | 77.0 |
|
||||
| GPQA-Diamond | 64.0 | 81.0 | 82.8 | 69.6 |
|
||||
| ***Alignment*** |
|
||||
| ArenaHard | 91.7 | 95.1 | 95.7 | 89.5 |
|
||||
| IFEval (Strict Prompt) | 85.4 | 84.1 | 89.8 | 81.9 |
|
||||
| IFBench | 33.7 | 38.0 | 36.1 | 41.7 |
|
||||
| ***Math*** |
|
||||
| AIME 2024 | 79.3 | 91.4 | 82.3 | 89.7 |
|
||||
| AIME 2025 | 70.4 | 87.5 | 72.0 | 83.3 |
|
||||
| ***Code*** |
|
||||
| LCB v5 (08/24-02/25) | 65.2 | 74.8 | 63.4 | **77.5** |
|
||||
| LCB v6 (08/24-05/25) | 63.5 | 73.3 | 61.9 | **74.6** |
|
||||
| LCB Pro 25Q2 (Easy) | 53.6 | 63.9 | 47.4 | **68.9** |
|
||||
| LCB Pro 25Q2 (Med) | 2.6 | 7.0 | 1.8 | **10.5** |
|
||||
| SWE Verified (Agentless) | 27.4 | 57.6 | 48.9 | 43.1 |
|
||||
| ***Tool Calling*** |
|
||||
| BFCL V3 | 70.4 | 67.9 | 68.6 | 67.5 |
|
||||
|
||||
|
||||
## Usage Recommendations
|
||||
|
||||
For local deployment, we recommend setting the sampling parameters to temperature = 0.6, top_p = 0.95.
|
||||
We recommend using RoPE scaling with the [YaRN](https://arxiv.org/abs/2309.00071) method for better long-context support. This can be enabled by updating the model’s `config.json` as shown below:
|
||||
```json
|
||||
{
|
||||
...,
|
||||
"rope_scaling": {
|
||||
"rope_type": "yarn",
|
||||
"factor": 2.0,
|
||||
"original_max_position_embeddings": 32768
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- **Nemotron-Cascade-14B-Thinking**: use `factor: 3.0` to extend the context length to 90K tokens for SWE Verified (Agentless), and `factor: 2.0` to extend the context length to 64K tokens for other benchmarks.
|
||||
- **Nemotron-Cascade-8B** and **Nemotron-Cascade-8B-Thinking**: use `factor: 2.0` across all benchmarks.
|
||||
|
||||
|
||||
## Evaluation Tookit
|
||||
|
||||
To reproduce our results, please check evaluation code, scripts, cached prediction files in https://huggingface.co/nvidia/Nemotron-Cascade-14B-Thinking/blob/main/evaluation/README.md
|
||||
|
||||
|
||||
## Chat Template
|
||||
|
||||
Nemotron-Cascade-14B-Thinking follows the Qwen3-style ChatML template and is designed exclusively for the ***thinking*** mode. To align with the template used in [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B), the `" /think"` tag should be appended to the end of the user input. Note that a leading space is included in this tag to ensure correct tokenization.
|
||||
|
||||
To reduce the context length in a multi-turn conversation, we include only the final summary of the model’s output in the conversation history and change the user turn’s `" /think"` tag to `" /no_think"`.
|
||||
|
||||
A brief example is shown below:
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer
|
||||
|
||||
model_name = 'nvidia/Nemotron-Cascade-14B-Thinking'
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
|
||||
'''
|
||||
single-turn example
|
||||
'''
|
||||
messages = [
|
||||
{"role": "user", "content": "calculate 1+1?"}
|
||||
]
|
||||
|
||||
# only thinking mode is supported (enable_thinking=True)
|
||||
prompt_thinking = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True)
|
||||
# prompt_thinking = '<|im_start|>system\nYou are a helpful and harmless assistant.<|im_end|>\n<|im_start|>user\ncalculate 1+1? /think<|im_end|>\n<|im_start|>assistant\n'
|
||||
|
||||
|
||||
'''
|
||||
multi-turn example
|
||||
'''
|
||||
messages = [
|
||||
{"role": "user", "content": "calculate 1+1?"},
|
||||
{"role": "assistant", "content": "<think>THINKING_CONTENT</think>\nTo calculate \\(1 + 1\\):\n\n1. **Identify the operation**: This is a basic addition problem involving two integers.\n2. **Perform the addition**: \n \\(1 + 1 = 2\\).\n\n**Result**: \\(\\boxed{2}\\)",},
|
||||
{"role": "user", "content": "what about 2+2"}
|
||||
]
|
||||
|
||||
# only thinking mode is supported (enable_thinking=True)
|
||||
prompt_thinking = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True)
|
||||
# prompt_thinking = '<|im_start|>system\nYou are a helpful and harmless assistant.<|im_end|>\n<|im_start|>user\ncalculate 1+1? /no_think<|im_end|>\n<|im_start|>assistant\nTo calculate \\(1 + 1\\):\n\n1. **Identify the operation**: This is a basic addition problem involving two integers.\n2. **Perform the addition**: \n \\(1 + 1 = 2\\).\n\n**Result**: \\(\\boxed\{2\}\\)<|im_end|>\n<|im_start|>user\nwhat about 2+2 /think<|im_end|>\n<|im_start|>assistant\n'
|
||||
```
|
||||
|
||||
|
||||
## Release Date
|
||||
Dec 08, 2025
|
||||
|
||||
|
||||
## License
|
||||
Your use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
|
||||
|
||||
|
||||
## Citation
|
||||
```
|
||||
@article{Nemotron_Cascade_Scaling_Cascaded_Reinforcement_Learning,
|
||||
title={Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models},
|
||||
author={Wang, Boxin and Lee, Chankyu and Lee, Nayeon and Lin, Sheng-Chieh and Dai, Wenliang and Chen, Yang and Chen, Yangyi and Yang, Zhuolin and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
|
||||
year={2025}
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user