初始化项目,由ModelHub XC社区提供模型
Model: Yale-BIDS-Chen/Llama-3.1-8B-Evidence-Filtering Source: Original Platform
This commit is contained in:
134
README.md
Normal file
134
README.md
Normal file
@@ -0,0 +1,134 @@
|
||||
---
|
||||
library_name: transformers
|
||||
license: cc-by-nc-4.0
|
||||
base_model: meta-llama/Llama-3.1-8B-Instruct
|
||||
tags:
|
||||
- rag
|
||||
- filtering
|
||||
---
|
||||
|
||||
|
||||
## Model Description
|
||||
|
||||
This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct), trained for 🚀**evidence relevance classification or evidence filtering**🚀 in medical RAG pipelines.
|
||||
Given a clinical query and a candidate passage, the model outputs *“Yes”* if the passage contains supporting evidence and *“No”* otherwise.
|
||||
|
||||
This lightweight classifier is designed to help researchers:
|
||||
- Improve retrieval quality in medical RAG systems.
|
||||
- Filter irrelevant passages before generation.
|
||||
- Build more reliable, interpretable RAG pipelines for medical QA.
|
||||
|
||||
For additional context, methodology, and full experimental details, please refer to our paper below.
|
||||
|
||||
📄 **Paper**: [Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights](https://arxiv.org/abs/2511.06738)
|
||||
|
||||
|
||||
## Quick Start
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
import torch
|
||||
|
||||
model_id = "Yale-BIDS-Chen/Llama-3.1-8B-Evidence-Filtering"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_id,
|
||||
torch_dtype=torch.bfloat16,
|
||||
device_map="auto",
|
||||
)
|
||||
|
||||
# Instruction used during training
|
||||
INSTRUCTION = (
|
||||
"Given a query and a text passage, determine whether the passage contains supporting evidence for the query. "
|
||||
"Supporting evidence means that the passage provides clear, relevant, and factual information that directly backs or justifies the answer to the query.\n\n"
|
||||
"Respond with one of the following labels:\n\"Yes\" if the passage contains supporting evidence for the query.\n"
|
||||
"\"No\" if the passage does not contain supporting evidence.\n"
|
||||
"You should respond with only the label (Yes or No) without any additional explanation."
|
||||
)
|
||||
|
||||
# Example query + retrieved passage
|
||||
query = "What is the first-line treatment for acute angle-closure glaucoma?"
|
||||
doc = "Acute angle-closure glaucoma requires immediate treatment with topical beta-blockers, alpha agonists, and systemic carbonic anhydrase inhibitors."
|
||||
|
||||
# Build chat-style prompt
|
||||
content = tokenizer.apply_chat_template(
|
||||
[
|
||||
{"role": "system", "content": INSTRUCTION},
|
||||
{"role": "user", "content": f"Question: {query}\nPassage: {doc}"}
|
||||
],
|
||||
add_generation_prompt=True,
|
||||
tokenize=False,
|
||||
)
|
||||
|
||||
# Tokenize
|
||||
input_ids = tokenizer(content, return_tensors="pt").input_ids.to(model.device)
|
||||
|
||||
# Define stopping tokens (Llama-3 style)
|
||||
terminators = [
|
||||
tokenizer.eos_token_id,
|
||||
tokenizer.convert_tokens_to_ids("<|eot_id|>")
|
||||
]
|
||||
|
||||
# Generate evidence-filtering judgment
|
||||
outputs = model.generate(
|
||||
input_ids=input_ids,
|
||||
max_new_tokens=256,
|
||||
eos_token_id=terminators,
|
||||
do_sample=False,
|
||||
temperature=0.0,
|
||||
)
|
||||
|
||||
# Decode model response
|
||||
response = outputs[0][input_ids.shape[-1]:]
|
||||
print(tokenizer.decode(response, skip_special_tokens=True))
|
||||
```
|
||||
|
||||
|
||||
## Training Setup
|
||||
|
||||
- **Dataset:** 3,200 query–passage pairs with expert-provided Yes/No labels (dataset to be released in a future update).
|
||||
- **Task:** Given a query and a candidate passage, the model generates *"Yes"* if the passage contains supporting evidence and *"No"* otherwise.
|
||||
- **Objective:** Causal language modeling (cross-entropy next-token loss).
|
||||
- **Prompt:** See the *Quick Start* section for an example usage prompt.
|
||||
- **Hyperparameter Tuning:** Five-fold cross-validation.
|
||||
- **Final Hyperparameters:**
|
||||
- Learning rate: 2e-6
|
||||
- Batch size: 8
|
||||
- Epochs: 3
|
||||
- **Training Framework:** [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).
|
||||
|
||||
|
||||
## Performance
|
||||
|
||||
Evaluation was conducted on 3,200 expert-annotated query–passage pairs using five-fold cross-validation.
|
||||
|
||||
| Model | Precision | Recall | F1 |
|
||||
|-------------------------------------|-----------|--------|------|
|
||||
| **Llama-3.1-8B (zero-shot)** | 0.483 | 0.566 | 0.521 |
|
||||
| **GPT-4o (zero-shot)** | 0.697 | 0.324 | 0.442 |
|
||||
| **Llama-3.1-8B (fine-tuned, ours)** | **0.592** | **0.657** | **0.623** |
|
||||
|
||||
🔥 Fine-tuning yields substantial gains over all zero-shot baselines.
|
||||
|
||||
|
||||
## Intended Use
|
||||
|
||||
This model is intended for research purposes only.
|
||||
|
||||
|
||||
## Reference
|
||||
|
||||
Please see the information below to cite our paper.
|
||||
```bibtex
|
||||
@article{kim2025rethinking,
|
||||
title={Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights},
|
||||
author={Kim, Hyunjae and Sohn, Jiwoong and Gilson, Aidan and Cochran-Caggiano, Nicholas and Applebaum, Serina and Jin, Heeju and Park, Seihee and Park, Yujin and Park, Jiyeong and Choi, Seoyoung and others},
|
||||
journal={arXiv preprint arXiv:2511.06738},
|
||||
year={2025}
|
||||
}
|
||||
```
|
||||
|
||||
## Contact
|
||||
|
||||
Feel free to email `hyunjae.kim@yale.edu` if you have any questions.
|
||||
Reference in New Issue
Block a user