veritas-8B-fact-checker-non…/README.md

---
base_model: Qwen/Qwen3-8B
tags:
- transformers
- factual-grounding
- fact-checking
- qwen3
license: apache-2.0
language:
- en
---
# About Resect Research Labs
- Resect Research Labs focuses on improving factual grounding as well as detecting, reducing, and mitigating hallucinations in AI models through proprietary reinforcement learning and novel fine-tuning techniques.

## Introducing: Veritas-8B-Fact-Checker-Non-Thinking-1.0
- **Veritas-8B-Fact-Checker-Non-Thinking-1.0** is built on the **Qwen3 architecture**, starting from [**(Qwen/Qwen3-8B)**](https://huggingface.co/Qwen/Qwen3-8B). Resect Research Labs has specialized, finetuned, and optimized this model for **fact-checking and factual consistency verification**.

### Model Performance
- The performance of this model is evaluated on [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact) (unseen by this model during training),
the benchmark is an aggregation of 11 human annotated datasets on fact-checking and grounding. 

### Overall Performance
- **Veritas-8B-Fact-Checker-Non-Thinking-1.0** achieves an **average score of 75.47%**, an improvement of **2.3%** above Qwen3-8B in non-thinking mode.

### Benchmark Details (LLM-AggreFact)

Balanced Accuracy Scores

| Model | Size | Avg | CNN | XSum | MediaS | MeetB | WiCE | REVEAL | Claim Verify | Fact Check | Expert QA | LFQA | RAG Truth |
|------|------|------|------|------|--------|-------|------|--------|--------------|------------|-----------|------|-----------|
| Qwen3-8B (non-thinking) | 8B | 73.17 | 66.30 | 71.25 | 69.05 | 77.50 | 76.16 | 84.81 | 67.71 | **77.39** | 57.14 | 80.62 | 76.91 | 
| Veritas-8B-Fact-Checker-Non-Thinking-1.0 | 8B | **75.47** | **68.84** | **74.33** | **71.83** | **76.44** | **79.06** | **86.94** | **72.40** | 76.75 | **58.12** | **84.14** | **81.30** |


<sup>**The benchmarks noted here for Veritas-0.6B-Fact-Checker-Non-Thinking-1.0 were performed on the test set and a PR has been submitted to [Minicheck's Library (Pull Request)](https://github.com/Liyan06/MiniCheck/pull/17) to support additional operating modes including this model.**</sup>
<sup>Note: Performance may vary slightly depending on hardware configuration and vLLM version</sup>

---

# Model Usage 

## Scope of Use

* Veritas-8B-Fact-Checker-Non-Thinking model must only be used strictly for the prescribed scoring mode, which generates a binary classification based on the specified template. Any deviation from this intended use may lead to unexpected outputs.

## Using Minicheck's library [^2]

**Requires the changes from our Pull Request to be merged, see [^2]**

Please run the following command to install the **MiniCheck package** and all necessary dependencies.
```sh
pip install "minicheck[llm] @ git+https://github.com/Liyan06/MiniCheck.git@main"
```

[^2]: Pull Request to [Minicheck's library submitted](https://github.com/Liyan06/MiniCheck/pull/17) awaiting review

#### Below is a simple use case

```python
from minicheck.minicheck import MiniCheck
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

doc = "A group of students gather in the school library to study for their upcoming final exams."
claim_1 = "The students are preparing for an examination."
claim_2 = "The students are on vacation."

chat_kwargs = {'enable_thinking': False}

scorer = MiniCheck(model_name='resect-ai/veritas-8B-fact-checker-non-thinking-1.0', enable_prefix_caching=False, extra_chat_template_kwargs=chat_kwargs, operating_mode="bespoke", max_new_tokens=1, cache_dir='./ckpts', bypass_model_check=True)
pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2]) # can set `chunk_size=your-specified-value` here, default to 32K chunk size. 

print(pred_label) # [1, 0]
print(raw_prob)   # [0.9465315396494047, 0.008577206810662688]
```

### Test on [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact) Benchmark [^2]

```python
import pandas as pd
from datasets import load_dataset
from minicheck.minicheck import MiniCheck
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# load 30K test data
df = pd.DataFrame(load_dataset("lytang/LLM-AggreFact")['test'])
docs = df.doc.values
claims = df.claim.values

chat_kwargs = {'enable_thinking': False}

scorer = MiniCheck(model_name='resect-ai/veritas-8B-fact-checker-non-thinking-1.0', enable_prefix_caching=False, extra_chat_template_kwargs=chat_kwargs, operating_mode="bespoke", max_new_tokens=1, cache_dir='./ckpts', bypass_model_check=True)
pred_label, raw_prob, _, _ = scorer.score(docs=docs, claims=claims)
```

To evaluate the result on the benchmark
```python
from sklearn.metrics import balanced_accuracy_score

df['preds'] = pred_label
result_df = pd.DataFrame(columns=['Dataset', 'BAcc'])
for dataset in df.dataset.unique():
    sub_df = df[df.dataset == dataset]
    bacc = balanced_accuracy_score(sub_df.label, sub_df.preds) * 100
    result_df.loc[len(result_df)] = [dataset, bacc]

result_df.loc[len(result_df)] = ['Average', result_df.BAcc.mean()]
result_df.round(1)
```

[^2]: Pull Request to [Minicheck's library submitted](https://github.com/Liyan06/MiniCheck/pull/17) awaiting review

# License
- This model **Veritas-8B-Fact-Checker-Non-Thinking-1.0** is bound by the Apache 2.0 license found at https://choosealicense.com/licenses/apache-2.0. By downloading and using this model you agree to the license terms.


# Acknowledgements

Model perfected by [Resect Research Labs](https://www.resect.ai/).
初始化项目，由ModelHub XC社区提供模型 Model: resect-ai/veritas-8B-fact-checker-non-thinking-1.0 Source: Original Platform 2026-06-03 11:08:27 +08:00			`---`
			`base_model: Qwen/Qwen3-8B`
			`tags:`
			`- transformers`
			`- factual-grounding`
			`- fact-checking`
			`- qwen3`
			`license: apache-2.0`
			`language:`
			`- en`
			`---`
			`# About Resect Research Labs`
			`- Resect Research Labs focuses on improving factual grounding as well as detecting, reducing, and mitigating hallucinations in AI models through proprietary reinforcement learning and novel fine-tuning techniques.`

			`## Introducing: Veritas-8B-Fact-Checker-Non-Thinking-1.0`
			`- Veritas-8B-Fact-Checker-Non-Thinking-1.0 is built on the Qwen3 architecture, starting from [(Qwen/Qwen3-8B)](https://huggingface.co/Qwen/Qwen3-8B). Resect Research Labs has specialized, finetuned, and optimized this model for fact-checking and factual consistency verification.`

			`### Model Performance`
			`- The performance of this model is evaluated on [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact) (unseen by this model during training),`
			`the benchmark is an aggregation of 11 human annotated datasets on fact-checking and grounding.`

			`### Overall Performance`
			`- Veritas-8B-Fact-Checker-Non-Thinking-1.0 achieves an average score of 75.47%, an improvement of 2.3% above Qwen3-8B in non-thinking mode.`

			`### Benchmark Details (LLM-AggreFact)`

			`Balanced Accuracy Scores`

			`\| Model \| Size \| Avg \| CNN \| XSum \| MediaS \| MeetB \| WiCE \| REVEAL \| Claim Verify \| Fact Check \| Expert QA \| LFQA \| RAG Truth \|`
			`\|------\|------\|------\|------\|------\|--------\|-------\|------\|--------\|--------------\|------------\|-----------\|------\|-----------\|`
			`\| Qwen3-8B (non-thinking) \| 8B \| 73.17 \| 66.30 \| 71.25 \| 69.05 \| 77.50 \| 76.16 \| 84.81 \| 67.71 \| 77.39 \| 57.14 \| 80.62 \| 76.91 \|`
			`\| Veritas-8B-Fact-Checker-Non-Thinking-1.0 \| 8B \| 75.47 \| 68.84 \| 74.33 \| 71.83 \| 76.44 \| 79.06 \| 86.94 \| 72.40 \| 76.75 \| 58.12 \| 84.14 \| 81.30 \|`


			`<sup>The benchmarks noted here for Veritas-0.6B-Fact-Checker-Non-Thinking-1.0 were performed on the test set and a PR has been submitted to [Minicheck's Library (Pull Request)](https://github.com/Liyan06/MiniCheck/pull/17) to support additional operating modes including this model.</sup>`
			`<sup>Note: Performance may vary slightly depending on hardware configuration and vLLM version</sup>`

			`---`

			`# Model Usage`

			`## Scope of Use`

			`* Veritas-8B-Fact-Checker-Non-Thinking model must only be used strictly for the prescribed scoring mode, which generates a binary classification based on the specified template. Any deviation from this intended use may lead to unexpected outputs.`

			`## Using Minicheck's library [^2]`

			`Requires the changes from our Pull Request to be merged, see [^2]`

			`Please run the following command to install the MiniCheck package and all necessary dependencies.`
			```sh
			`pip install "minicheck[llm] @ git+https://github.com/Liyan06/MiniCheck.git@main"`
			```

			`[^2]: Pull Request to [Minicheck's library submitted](https://github.com/Liyan06/MiniCheck/pull/17) awaiting review`

			`#### Below is a simple use case`

			```python
			`from minicheck.minicheck import MiniCheck`
			`import os`
			`os.environ["CUDA_VISIBLE_DEVICES"] = "0"`

			`doc = "A group of students gather in the school library to study for their upcoming final exams."`
			`claim_1 = "The students are preparing for an examination."`
			`claim_2 = "The students are on vacation."`

			`chat_kwargs = {'enable_thinking': False}`

			`scorer = MiniCheck(model_name='resect-ai/veritas-8B-fact-checker-non-thinking-1.0', enable_prefix_caching=False, extra_chat_template_kwargs=chat_kwargs, operating_mode="bespoke", max_new_tokens=1, cache_dir='./ckpts', bypass_model_check=True)`
			pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2]) # can set `chunk_size=your-specified-value` here, default to 32K chunk size.

			`print(pred_label) # [1, 0]`
			`print(raw_prob) # [0.9465315396494047, 0.008577206810662688]`
			```

			`### Test on [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact) Benchmark [^2]`

			```python
			`import pandas as pd`
			`from datasets import load_dataset`
			`from minicheck.minicheck import MiniCheck`
			`import os`
			`os.environ["CUDA_VISIBLE_DEVICES"] = "0"`

			`# load 30K test data`
			`df = pd.DataFrame(load_dataset("lytang/LLM-AggreFact")['test'])`
			`docs = df.doc.values`
			`claims = df.claim.values`

			`chat_kwargs = {'enable_thinking': False}`

			`scorer = MiniCheck(model_name='resect-ai/veritas-8B-fact-checker-non-thinking-1.0', enable_prefix_caching=False, extra_chat_template_kwargs=chat_kwargs, operating_mode="bespoke", max_new_tokens=1, cache_dir='./ckpts', bypass_model_check=True)`
			`pred_label, raw_prob, _, _ = scorer.score(docs=docs, claims=claims)`
			```

			`To evaluate the result on the benchmark`
			```python
			`from sklearn.metrics import balanced_accuracy_score`

			`df['preds'] = pred_label`
			`result_df = pd.DataFrame(columns=['Dataset', 'BAcc'])`
			`for dataset in df.dataset.unique():`
			`sub_df = df[df.dataset == dataset]`
			`bacc = balanced_accuracy_score(sub_df.label, sub_df.preds) * 100`
			`result_df.loc[len(result_df)] = [dataset, bacc]`

			`result_df.loc[len(result_df)] = ['Average', result_df.BAcc.mean()]`
			`result_df.round(1)`
			```

			`[^2]: Pull Request to [Minicheck's library submitted](https://github.com/Liyan06/MiniCheck/pull/17) awaiting review`

			`# License`
			`- This model Veritas-8B-Fact-Checker-Non-Thinking-1.0 is bound by the Apache 2.0 license found at https://choosealicense.com/licenses/apache-2.0. By downloading and using this model you agree to the license terms.`


			`# Acknowledgements`

			`Model perfected by [Resect Research Labs](https://www.resect.ai/).`