初始化项目，由ModelHub XC社区提供模型

Model: HasuerYu/KnowRL-Nemotron-1.5B Source: Original Platform
2026-04-22 11:22:52 +08:00
commit 884f22c45b
11 changed files with 151937 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,36 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,209 @@
 ---
 pretty_name: KnowRL-Nemotron-1.5B
 language:
  - en
 license: apache-2.0
 library_name: transformers
 pipeline_tag: text-generation
 tags:
  - knowrl
  - rlvr
  - reasoning
  - math
  - knowledge-points
  - reinforcement-learning
 base_model: nvidia/OpenMath-Nemotron-1.5B
 model-index:
  - name: KnowRL-Nemotron-1.5B
    results:
      - task:
          type: mathematical-reasoning
        dataset:
          name: AIME 2024
          type: aime24
        metrics:
          - name: Accuracy (w/o KP)
            type: accuracy
            value: 69.79
          - name: Accuracy (CSS)
            type: accuracy
            value: 74.58
      - task:
          type: mathematical-reasoning
        dataset:
          name: AIME 2025
          type: aime25
        metrics:
          - name: Accuracy (w/o KP)
            type: accuracy
            value: 64.69
          - name: Accuracy (CSS)
            type: accuracy
            value: 65.21
      - task:
          type: mathematical-reasoning
        dataset:
          name: MATH-500
          type: math-500
        metrics:
          - name: Accuracy (w/o KP)
            type: accuracy
            value: 95.70
          - name: Accuracy (CSS)
            type: accuracy
            value: 96.20
      - task:
          type: mathematical-reasoning
        dataset:
          name: Olympiad Bench
          type: olympiad-bench
        metrics:
          - name: Accuracy (w/o KP)
            type: accuracy
            value: 80.23
          - name: Accuracy (CSS)
            type: accuracy
            value: 82.44
 ---
 # KnowRL-Nemotron-1.5B
 > **KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance**
 [![arXiv](https://img.shields.io/badge/2604.12627-b31b1b.svg)](https://arxiv.org/abs/2604.12627)
 [![GitHub](https://img.shields.io/badge/💻%20GitHub-KnowRL-black)](https://github.com/HasuerYu/KnowRL)
 [![Collection](https://img.shields.io/badge/🤗%20HuggingFace-Collection-yellow)](https://huggingface.co/collections/HasuerYu/knowrl)
 [![Training Data](https://img.shields.io/badge/🤗%20HuggingFace-Training%20Data-yellow)](https://huggingface.co/datasets/HasuerYu/KnowRL-Train-Data)
 [![KP Annotations](https://img.shields.io/badge/🤗%20HuggingFace-KP%20Annotations-yellow)](https://huggingface.co/datasets/HasuerYu/KnowRL-KP-Annotations)
 ## Model Summary
 **KnowRL-Nemotron-1.5B** is a 1.5B-parameter math reasoning model trained with reinforcement learning (DAPO/GRPO) under **minimal-sufficient knowledge point (KP) guidance**. It is fine-tuned from [nvidia/OpenMath-Nemotron-1.5B](https://huggingface.co/nvidia/OpenMath-Nemotron-1.5B) and achieves state-of-the-art results among 1.5B-scale models on competition-level math benchmarks.
 Instead of injecting long solution hints or full reasoning templates, KnowRL decomposes guidance into atomic **knowledge points (KPs)** and identifies the **minimal subset** required to unlock reward learning — achieving more with less.
 ## Key Highlights
 - **74.16** average accuracy (CSS) across 8 competition-level math benchmarks — new SOTA at 1.5B scale
 - **70.08** average accuracy even **without** KP hints at inference, demonstrating genuine policy improvement (+9.63 over baseline)
 - Trained with **~38% fewer KPs** than full-KP injection via the CSS (Constrained Subset Search) selection strategy
 - Reward sparsity reduced from **41.21%** zero-correct to **13.00%** during training
 ## Results
 | Benchmark | w/o KP | CBRS | CSS |
 |:----------|:------:|:----:|:---:|
 | AIME 2024 | 69.79 | 75.52 | 74.58 |
 | AIME 2025 | 64.69 | 65.00 | 65.21 |
 | BRUMO 2025 | 69.48 | 78.33 | 78.12 |
 | HMMT 2025 | 41.04 | 45.00 | 48.75 |
 | AMC 2023 | 95.55 | 95.78 | 95.70 |
 | CMIMC 2025 | 44.14 | 49.22 | 52.19 |
 | MATH-500 | 95.70 | 96.45 | 96.20 |
 | Olympiad Bench | 80.23 | 82.34 | 82.44 |
 | **Average** | **70.08** | **73.46** | **74.16** |
 > **w/o KP**: No knowledge point hints at inference.
 > **CBRS / CSS**: KP hints selected by the respective strategy are prepended to the prompt at inference.
 ## Usage
 ### Basic Inference (without KP hints)
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = "HasuerYu/KnowRL-Nemotron-1.5B"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
 problem = "Find the sum of all positive integers n such that n^2 - 19n + 99 is a perfect square."
 prompt = f"{problem}\nPlease reason step by step, and put your final answer within \\boxed{{}}."
 messages = [{"role": "user", "content": prompt}]
 text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 inputs = tokenizer(text, return_tensors="pt").to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=32768, temperature=0.6, top_p=0.95)
 response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
 print(response)
 ```
 ### Inference with KP Hints
 For best performance, prepend selected knowledge points as a hint section in the prompt:
 ```python
 knowledge_points = [
    "If n^2 - 19n + 99 = m^2, then (2n - 19)^2 - 4m^2 = -15.",
 ]
 hint = "## Hint\n" + "\n".join(f"- {kp}" for kp in knowledge_points)
 prompt = f"{problem}\n{hint}\nPlease reason step by step, and put your final answer within \\boxed{{}}."
 ```
 ### vLLM Serving
 ```bash
 vllm serve HasuerYu/KnowRL-Nemotron-1.5B \
    --tensor-parallel-size 1 \
    --max-model-len 32768 \
    --trust-remote-code
 ```
 ## Training Details
 | Parameter | Value |
 |-----------|-------|
 | Base model | `nvidia/OpenMath-Nemotron-1.5B` |
 | Algorithm | DAPO / GRPO |
 | Framework | [verl](https://github.com/volcengine/verl) + Ray |
 | Learning rate | 1e-6 |
 | Batch size | 256 |
 | Max prompt length | 8,192 |
 | Max response length | 32,768 |
 | Samples per prompt | 8 |
 | Total training steps | 2,960 |
 | Hardware | 8× NVIDIA H100 nodes (64 GPUs) |
 An **entropy annealing** strategy is applied: after step 2,590, the clip upper bound is reduced from 0.28 to 0.26 to encourage the policy to shift from exploration to exploitation, contributing +0.74 average accuracy.
 ## How KnowRL Works
 1. **KP Extraction**: Decompose solution guidance into atomic knowledge points (KPs)
 2. **KP Selection**: Apply selection strategies (CSS, CBRS) to identify the minimal-sufficient subset of KPs per problem
 3. **RL Training**: Train with DAPO/GRPO, injecting selected KPs as hints in the prompt during rollout
 4. **Inference**: The trained model can be used with or without KP hints — even without hints, it significantly outperforms the baseline
 ## Related Resources
 | Resource | Link |
 |----------|------|
 | KnowRL Collection | [HasuerYu/knowrl](https://huggingface.co/collections/HasuerYu/knowrl) |
 | Training Data | [HasuerYu/KnowRL-Train-Data](https://huggingface.co/datasets/HasuerYu/KnowRL-Train-Data) |
 | KP Annotations | [HasuerYu/KnowRL-KP-Annotations](https://huggingface.co/datasets/HasuerYu/KnowRL-KP-Annotations) |
 ## Limitations
 - Optimized for competition-level math reasoning; performance on other domains is not evaluated
 - KP hint quality at inference depends on upstream KP extraction and selection pipelines
 - The model inherits limitations from the base model (`nvidia/OpenMath-Nemotron-1.5B`)
 ## Citation
 If you find this model helpful, please cite:
 ```bibtex
@misc{yu2026knowrlboostingllmreasoning,
      title={KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance}, 
      author={Linhao Yu and Tianmeng Yang and Siyu Ding and Renren Jin and Naibin Gu and Xiangzhao Hao and Shuaiyi Nie and Deyi Xiong and Weichong Yin and Yu Sun and Hua Wu},
      year={2026},
      eprint={2604.12627},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2604.12627}, 
 }
 ```
 ## License
 This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,24 @@
 {
  "</tool_call>": 151658,
  "<tool_call>": 151657,
  "<|box_end|>": 151649,
  "<|box_start|>": 151648,
  "<|endoftext|>": 151643,
  "<|file_sep|>": 151664,
  "<|fim_middle|>": 151660,
  "<|fim_pad|>": 151662,
  "<|fim_prefix|>": 151659,
  "<|fim_suffix|>": 151661,
  "<|im_end|>": 151645,
  "<|im_start|>": 151644,
  "<|image_pad|>": 151655,
  "<|object_ref_end|>": 151647,
  "<|object_ref_start|>": 151646,
  "<|quad_end|>": 151651,
  "<|quad_start|>": 151650,
  "<|repo_name|>": 151663,
  "<|video_pad|>": 151656,
  "<|vision_end|>": 151653,
  "<|vision_pad|>": 151654,
  "<|vision_start|>": 151652
 }
--- a/config.json
+++ b/config.json
@@ -0,0 +1,28 @@
 {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 1536,
  "initializer_range": 0.02,
  "intermediate_size": 8960,
  "max_position_embeddings": 131072,
  "max_window_layers": 21,
  "model_type": "qwen2",
  "num_attention_heads": 12,
  "num_hidden_layers": 28,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "sliding_window": null,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.51.1",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,6 @@
 {
  "_from_model_config": true,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "transformers_version": "4.51.1"
 }
--- a/merges.txt
+++ b/merges.txt
--- a/model.safetensors
+++ b/model.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:ce46e9daa453c036d937ae3d15fdf6fe3b438f90fc910899e407f5228c5d357a
 size 3554214752
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,31 @@
 {
  "additional_special_tokens": [
    "<|im_start|>",
    "<|im_end|>",
    "<|object_ref_start|>",
    "<|object_ref_end|>",
    "<|box_start|>",
    "<|box_end|>",
    "<|quad_start|>",
    "<|quad_end|>",
    "<|vision_start|>",
    "<|vision_end|>",
    "<|vision_pad|>",
    "<|image_pad|>",
    "<|video_pad|>"
  ],
  "eos_token": {
    "content": "<|im_end|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": {
    "content": "<|endoftext|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer.json
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
 size 11421896
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,208 @@
 {
  "add_bos_token": false,
  "add_prefix_space": false,
  "added_tokens_decoder": {
    "151643": {
      "content": "<|endoftext|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151644": {
      "content": "<|im_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151645": {
      "content": "<|im_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151646": {
      "content": "<|object_ref_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151647": {
      "content": "<|object_ref_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151648": {
      "content": "<|box_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151649": {
      "content": "<|box_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151650": {
      "content": "<|quad_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151651": {
      "content": "<|quad_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151652": {
      "content": "<|vision_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151653": {
      "content": "<|vision_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151654": {
      "content": "<|vision_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151655": {
      "content": "<|image_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151656": {
      "content": "<|video_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151657": {
      "content": "<tool_call>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151658": {
      "content": "</tool_call>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151659": {
      "content": "<|fim_prefix|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151660": {
      "content": "<|fim_middle|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151661": {
      "content": "<|fim_suffix|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151662": {
      "content": "<|fim_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151663": {
      "content": "<|repo_name|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151664": {
      "content": "<|file_sep|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    }
  },
  "additional_special_tokens": [
    "<|im_start|>",
    "<|im_end|>",
    "<|object_ref_start|>",
    "<|object_ref_end|>",
    "<|box_start|>",
    "<|box_end|>",
    "<|quad_start|>",
    "<|quad_end|>",
    "<|vision_start|>",
    "<|vision_end|>",
    "<|vision_pad|>",
    "<|image_pad|>",
    "<|video_pad|>"
  ],
  "bos_token": null,
  "chat_template": "{%- if messages[0]['role'] == 'system' %}\n    {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}\n{%- else %}\n    {{- '<|im_start|>system\n<|im_end|>\n' }}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == 'user') or (message.role == 'system' and not loop.first) or (message.role == 'assistant') %}\n        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\n' }}\n{%- endif %}",
  "clean_up_tokenization_spaces": false,
  "eos_token": "<|im_end|>",
  "errors": "replace",
  "extra_special_tokens": {},
  "model_max_length": 131072,
  "pad_token": "<|endoftext|>",
  "split_special_tokens": false,
  "tokenizer_class": "Qwen2Tokenizer",
  "unk_token": null
 }
--- a/vocab.json
+++ b/vocab.json