--- license: llama3.1 base_model: meta-llama/Llama-3.1-8B-Instruct tags: - adaptive-rag - uncertainty-quantification - retrieval-augmented-generation - question-answering - reinforcement-learning - grpo language: - en --- # uncertain-calibrate Fine-tuned from `meta-llama/Llama-3.1-8B-Instruct` via **GRPO reinforcement learning** to emit a special `` token when the model is uncertain during reasoning, enabling uncertainty-guided adaptive retrieval. ## What it does The model reasons step-by-step and inserts `` at any point where it lacks confidence in a fact. A lightweight ridge regression probe (trained on layer-13 hidden states at the `` span) then decides whether to trigger BM25 retrieval and a second-pass generation. ## Training - **Base model**: `meta-llama/Llama-3.1-8B-Instruct` - **Training method**: GRPO (Group Relative Policy Optimization) with EM-based reward; the model is rewarded for correct final answers, encouraging it to emit `` in contexts where retrieval would help - **Target datasets**: Multi-hop QA (HotpotQA, MuSiQue, 2WikiMultiHopQA) and open-domain QA (NQ, TriviaQA) ## Retrieval gating (probe) A separate ridge regression probe on layer-13 hidden states over `` spans must be trained to use this model for adaptive RAG. The probe AUROC on held-out data is ~0.82. Use the companion probe artifact `uncertain_probe_layer13_alpha3000.pkl` from the [AdaRAGUE repository](https://github.com/JamesJunyuGuo/AdaRAGUE). ## Evaluation (dev_500_subsampled, 500 questions × 5 datasets, with probe gating) | Dataset | EM | F1 | Trigger Rate | |---|---|---|---| | HotpotQA | 32.6 | 42.7 | 67.4% | | MuSiQue | 7.6 | 14.1 | 94.2% | | 2WikiMultiHopQA | 26.2 | 29.6 | 59.2% | | NQ | 31.4 | 41.0 | 52.0% | | TriviaQA | 56.6 | 63.2 | 34.0% | | **Overall** | **30.9** | **38.1** | **61.4%** | Trigger rate = fraction of questions where the probe decided to retrieve. ## Intended use ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("your-username/uncertain-calibrate") model = AutoModelForCausalLM.from_pretrained("your-username/uncertain-calibrate") SYSTEM = ( "You are a helpful reasoning assistant. Think step by step. " "If at any point you are uncertain about a fact, emit the special token " " to signal that you need more information. " "End your response with 'Answer: ' on the last line." ) prompt = tokenizer.apply_chat_template([ {"role": "system", "content": SYSTEM}, {"role": "user", "content": "Who directed the film Interstellar?"}, ], tokenize=False, add_generation_prompt=True)