Go to file

ModelHub XC a6180b63b8 初始化项目，由ModelHub XC社区提供模型

Model: jamesjunyuguo/uncertain-calibrate
Source: Original Platform

2026-04-20 15:41:19 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-04-20 15:41:19 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-20 15:41:19 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-20 15:41:19 +08:00

model-00001-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-20 15:41:19 +08:00

model-00002-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-20 15:41:19 +08:00

model-00003-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-20 15:41:19 +08:00

model-00004-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-20 15:41:19 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-04-20 15:41:19 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-04-20 15:41:19 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-04-20 15:41:19 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-20 15:41:19 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-04-20 15:41:19 +08:00

README.md

license, base_model, tags, language

license

base_model

uncertain-calibrate

Fine-tuned from meta-llama/Llama-3.1-8B-Instruct via GRPO reinforcement learning to emit a special <uncertain> token when the model is uncertain during reasoning, enabling uncertainty-guided adaptive retrieval.

What it does

The model reasons step-by-step and inserts <uncertain> at any point where it lacks confidence in a fact. A lightweight ridge regression probe (trained on layer-13 hidden states at the <uncertain> span) then decides whether to trigger BM25 retrieval and a second-pass generation.

Training

Base model: meta-llama/Llama-3.1-8B-Instruct
Training method: GRPO (Group Relative Policy Optimization) with EM-based reward; the model is rewarded for correct final answers, encouraging it to emit <uncertain> in contexts where retrieval would help
Target datasets: Multi-hop QA (HotpotQA, MuSiQue, 2WikiMultiHopQA) and open-domain QA (NQ, TriviaQA)

Retrieval gating (probe)

A separate ridge regression probe on layer-13 hidden states over <uncertain> spans must be trained to use this model for adaptive RAG. The probe AUROC on held-out data is ~0.82. Use the companion probe artifact uncertain_probe_layer13_alpha3000.pkl from the AdaRAGUE repository.

Evaluation (dev_500_subsampled, 500 questions × 5 datasets, with probe gating)

Dataset	EM	F1	Trigger Rate
HotpotQA	32.6	42.7	67.4%
MuSiQue	7.6	14.1	94.2%
2WikiMultiHopQA	26.2	29.6	59.2%
NQ	31.4	41.0	52.0%
TriviaQA	56.6	63.2	34.0%
Overall	30.9	38.1	61.4%

Trigger rate = fraction of questions where the probe decided to retrieve.

Intended use

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("your-username/uncertain-calibrate")
model = AutoModelForCausalLM.from_pretrained("your-username/uncertain-calibrate")

SYSTEM = (
    "You are a helpful reasoning assistant. Think step by step. "
    "If at any point you are uncertain about a fact, emit the special token "
    "<uncertain> to signal that you need more information. "
    "End your response with 'Answer: <your answer>' on the last line."
)

prompt = tokenizer.apply_chat_template([
    {"role": "system", "content": SYSTEM},
    {"role": "user",   "content": "Who directed the film Interstellar?"},
], tokenize=False, add_generation_prompt=True)

README.md Unescape Escape

uncertain-calibrate

What it does

Training

Retrieval gating (probe)

Evaluation (dev_500_subsampled, 500 questions × 5 datasets, with probe gating)

Intended use

README.md