license, base_model, tags, language
license base_model tags language
llama3.1 meta-llama/Llama-3.1-8B-Instruct
adaptive-rag
uncertainty-quantification
retrieval-augmented-generation
question-answering
reinforcement-learning
grpo
en

uncertain-calibrate

Fine-tuned from meta-llama/Llama-3.1-8B-Instruct via GRPO reinforcement learning to emit a special <uncertain> token when the model is uncertain during reasoning, enabling uncertainty-guided adaptive retrieval.

What it does

The model reasons step-by-step and inserts <uncertain> at any point where it lacks confidence in a fact. A lightweight ridge regression probe (trained on layer-13 hidden states at the <uncertain> span) then decides whether to trigger BM25 retrieval and a second-pass generation.

Training

  • Base model: meta-llama/Llama-3.1-8B-Instruct
  • Training method: GRPO (Group Relative Policy Optimization) with EM-based reward; the model is rewarded for correct final answers, encouraging it to emit <uncertain> in contexts where retrieval would help
  • Target datasets: Multi-hop QA (HotpotQA, MuSiQue, 2WikiMultiHopQA) and open-domain QA (NQ, TriviaQA)

Retrieval gating (probe)

A separate ridge regression probe on layer-13 hidden states over <uncertain> spans must be trained to use this model for adaptive RAG. The probe AUROC on held-out data is ~0.82. Use the companion probe artifact uncertain_probe_layer13_alpha3000.pkl from the AdaRAGUE repository.

Evaluation (dev_500_subsampled, 500 questions × 5 datasets, with probe gating)

Dataset EM F1 Trigger Rate
HotpotQA 32.6 42.7 67.4%
MuSiQue 7.6 14.1 94.2%
2WikiMultiHopQA 26.2 29.6 59.2%
NQ 31.4 41.0 52.0%
TriviaQA 56.6 63.2 34.0%
Overall 30.9 38.1 61.4%

Trigger rate = fraction of questions where the probe decided to retrieve.

Intended use

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("your-username/uncertain-calibrate")
model = AutoModelForCausalLM.from_pretrained("your-username/uncertain-calibrate")

SYSTEM = (
    "You are a helpful reasoning assistant. Think step by step. "
    "If at any point you are uncertain about a fact, emit the special token "
    "<uncertain> to signal that you need more information. "
    "End your response with 'Answer: <your answer>' on the last line."
)

prompt = tokenizer.apply_chat_template([
    {"role": "system", "content": SYSTEM},
    {"role": "user",   "content": "Who directed the film Interstellar?"},
], tokenize=False, add_generation_prompt=True)


Description
Model synced from source: jamesjunyuguo/uncertain-calibrate
Readme 30 KiB