Go to file

ModelHub XC cbe5de11fa 初始化项目，由ModelHub XC社区提供模型

Model: pixas/Miner-8B
Source: Original Platform

2026-04-22 17:45:11 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-04-22 17:45:11 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 17:45:11 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-04-22 17:45:11 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 17:45:11 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 17:45:11 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-04-22 17:45:11 +08:00

model-00001-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-22 17:45:11 +08:00

model-00002-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-22 17:45:11 +08:00

model-00003-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-22 17:45:11 +08:00

model-00004-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-22 17:45:11 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 17:45:11 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-04-22 17:45:11 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 17:45:11 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 17:45:11 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 17:45:11 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 17:45:11 +08:00

README.md

license, language, pipeline_tag, library_name, tags, model-index, datasets, base_model

license

language

pipeline_tag

library_name

Miner-8B

This repository hosts the Hugging Face Transformers checkpoint for MINER: Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models.

Paper: https://arxiv.org/pdf/2601.04731
Code: https://github.com/pixas/Miner

Model Description

Miner-8B is a reasoning model trained with MINER, a reinforcement learning method designed to improve data efficiency for large reasoning models. MINER targets the inefficiency of critic-free RL methods on positive homogeneous prompts, where all sampled rollouts are correct and standard relative-advantage training provides little or no learning signal. Instead, MINER leverages the policy’s intrinsic uncertainty as a self-supervised reward signal, without requiring auxiliary reward models or additional inference-time overhead. :contentReference[oaicite:1]{index=1}

The MINER framework introduces two central ideas:

Token-level focal credit assignment, which amplifies learning on uncertain and critical tokens while suppressing overconfident ones.
Adaptive advantage calibration, which integrates intrinsic and verifiable rewards in a stable way. :contentReference[oaicite:2]{index=2}

According to the paper, MINER is evaluated on six reasoning benchmarks using Qwen3-8B-Base and Qwen3-8B-Base, and reports stronger sample efficiency and accuracy than several baseline methods including GRPO variants. :contentReference[oaicite:3]{index=3}

Intended Use

This model is intended for research and experimental use in:

reasoning and problem solving
reinforcement learning for language models
mathematical and verifiable reasoning tasks
post-training and evaluation of large reasoning models

Potential use cases include:

academic research on RL for reasoning models
evaluation on reasoning benchmarks
ablation and reproduction studies based on the MINER framework
further finetuning or post-training from this checkpoint

How to Use

Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "pixas/Miner-8B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

prompt = [{"role": "user", "content": "What is 2+3?"}]
inputs = tokenizer(tokenizer.apply_chat_template(prompt, add_generation_prompt=True, tokenize=False), return_tensors='pt').to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=8192,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

vLLM

from vllm import LLM, SamplingParams

llm = LLM(model="pixas/Miner-8B")
sampling_params = SamplingParams(
    temperature=0.6,
    max_tokens=8192
)
prompt = [{"role": "user", "content": "What is 2+3?"}]
inputs = tokenizer.apply_chat_template(prompt, add_generation_prompt=True, tokenize=False)
outputs = llm.generate(
    inputs,
    sampling_params
)

print(outputs[0].outputs[0].text)

Limitations

This model is a research checkpoint and may have several limitations:

It may produce incorrect, incomplete, or overconfident reasoning outputs.
Performance may depend heavily on prompt format and decoding setup.
Results reported in the paper may not transfer exactly to this released checkpoint unless the same base model, data mixture, and evaluation pipeline are used.
The model is not intended as a substitute for expert judgment in high-stakes domains.

Bias, Risks, and Safety

Like other large language models, this model may reflect biases present in its training data and may generate harmful, misleading, or factually incorrect outputs. Additional care is required before deployment in user-facing or safety-critical applications.

Citation

If you use this model, please cite:

@article{jiang2026miner,
  title={Miner: Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models},
  author={Jiang, Shuyang and Wang, Yuhao and Zhang, Ya and Wang, Yanfeng and Wang, Yu},
  journal={arXiv preprint arXiv:2601.04731},
  year={2026}
}

Acknowledgements

This model card is based on the official MINER paper and code repository:

Paper: https://arxiv.org/pdf/2601.04731
Code: https://github.com/pixas/Miner

README.md Unescape Escape

Miner-8B

Model Description

Intended Use

How to Use

Transformers

vLLM

Limitations

Bias, Risks, and Safety

Citation

Acknowledgements

README.md