初始化项目,由ModelHub XC社区提供模型
Model: jamesjunyuguo/uncertain-calibrate Source: Original Platform
This commit is contained in:
67
README.md
Normal file
67
README.md
Normal file
@@ -0,0 +1,67 @@
|
||||
---
|
||||
license: llama3.1
|
||||
base_model: meta-llama/Llama-3.1-8B-Instruct
|
||||
tags:
|
||||
- adaptive-rag
|
||||
- uncertainty-quantification
|
||||
- retrieval-augmented-generation
|
||||
- question-answering
|
||||
- reinforcement-learning
|
||||
- grpo
|
||||
language:
|
||||
- en
|
||||
---
|
||||
|
||||
# uncertain-calibrate
|
||||
|
||||
Fine-tuned from `meta-llama/Llama-3.1-8B-Instruct` via **GRPO reinforcement learning** to emit a special `<uncertain>` token when the model is uncertain during reasoning, enabling uncertainty-guided adaptive retrieval.
|
||||
|
||||
## What it does
|
||||
|
||||
The model reasons step-by-step and inserts `<uncertain>` at any point where it lacks confidence in a fact. A lightweight ridge regression probe (trained on layer-13 hidden states at the `<uncertain>` span) then decides whether to trigger BM25 retrieval and a second-pass generation.
|
||||
|
||||
|
||||
## Training
|
||||
|
||||
- **Base model**: `meta-llama/Llama-3.1-8B-Instruct`
|
||||
- **Training method**: GRPO (Group Relative Policy Optimization) with EM-based reward; the model is rewarded for correct final answers, encouraging it to emit `<uncertain>` in contexts where retrieval would help
|
||||
- **Target datasets**: Multi-hop QA (HotpotQA, MuSiQue, 2WikiMultiHopQA) and open-domain QA (NQ, TriviaQA)
|
||||
|
||||
## Retrieval gating (probe)
|
||||
|
||||
A separate ridge regression probe on layer-13 hidden states over `<uncertain>` spans must be trained to use this model for adaptive RAG. The probe AUROC on held-out data is ~0.82. Use the companion probe artifact `uncertain_probe_layer13_alpha3000.pkl` from the [AdaRAGUE repository](https://github.com/JamesJunyuGuo/AdaRAGUE).
|
||||
|
||||
## Evaluation (dev_500_subsampled, 500 questions × 5 datasets, with probe gating)
|
||||
|
||||
| Dataset | EM | F1 | Trigger Rate |
|
||||
|---|---|---|---|
|
||||
| HotpotQA | 32.6 | 42.7 | 67.4% |
|
||||
| MuSiQue | 7.6 | 14.1 | 94.2% |
|
||||
| 2WikiMultiHopQA | 26.2 | 29.6 | 59.2% |
|
||||
| NQ | 31.4 | 41.0 | 52.0% |
|
||||
| TriviaQA | 56.6 | 63.2 | 34.0% |
|
||||
| **Overall** | **30.9** | **38.1** | **61.4%** |
|
||||
|
||||
Trigger rate = fraction of questions where the probe decided to retrieve.
|
||||
|
||||
## Intended use
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("your-username/uncertain-calibrate")
|
||||
model = AutoModelForCausalLM.from_pretrained("your-username/uncertain-calibrate")
|
||||
|
||||
SYSTEM = (
|
||||
"You are a helpful reasoning assistant. Think step by step. "
|
||||
"If at any point you are uncertain about a fact, emit the special token "
|
||||
"<uncertain> to signal that you need more information. "
|
||||
"End your response with 'Answer: <your answer>' on the last line."
|
||||
)
|
||||
|
||||
prompt = tokenizer.apply_chat_template([
|
||||
{"role": "system", "content": SYSTEM},
|
||||
{"role": "user", "content": "Who directed the film Interstellar?"},
|
||||
], tokenize=False, add_generation_prompt=True)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user