resect-ai/veritas-8B-fact-checker-non-thinking-1.0

Files

ModelHub XC e5925cf9e2 初始化项目，由ModelHub XC社区提供模型

Model: resect-ai/veritas-8B-fact-checker-non-thinking-1.0
Source: Original Platform

2026-06-03 11:08:27 +08:00

5.3 KiB

Raw Blame History

base_model, tags, license, language

base_model

About Resect Research Labs

Resect Research Labs focuses on improving factual grounding as well as detecting, reducing, and mitigating hallucinations in AI models through proprietary reinforcement learning and novel fine-tuning techniques.

Introducing: Veritas-8B-Fact-Checker-Non-Thinking-1.0

Veritas-8B-Fact-Checker-Non-Thinking-1.0 is built on the Qwen3 architecture, starting from (Qwen/Qwen3-8B). Resect Research Labs has specialized, finetuned, and optimized this model for fact-checking and factual consistency verification.

Model Performance

The performance of this model is evaluated on LLM-AggreFact (unseen by this model during training), the benchmark is an aggregation of 11 human annotated datasets on fact-checking and grounding.

Overall Performance

Veritas-8B-Fact-Checker-Non-Thinking-1.0 achieves an average score of 75.47%, an improvement of 2.3% above Qwen3-8B in non-thinking mode.

Benchmark Details (LLM-AggreFact)

Balanced Accuracy Scores

Model	Size	Avg	CNN	XSum	MediaS	MeetB	WiCE	REVEAL	Claim Verify	Fact Check	Expert QA	LFQA	RAG Truth
Qwen3-8B (non-thinking)	8B	73.17	66.30	71.25	69.05	77.50	76.16	84.81	67.71	77.39	57.14	80.62	76.91
Veritas-8B-Fact-Checker-Non-Thinking-1.0	8B	75.47	68.84	74.33	71.83	76.44	79.06	86.94	72.40	76.75	58.12	84.14	81.30

^{The benchmarks noted here for Veritas-0.6B-Fact-Checker-Non-Thinking-1.0 were performed on the test set and a PR has been submitted to Minicheck's Library (Pull Request) to support additional operating modes including this model.} ^{Note: Performance may vary slightly depending on hardware configuration and vLLM version}

Model Usage

Scope of Use

Veritas-8B-Fact-Checker-Non-Thinking model must only be used strictly for the prescribed scoring mode, which generates a binary classification based on the specified template. Any deviation from this intended use may lead to unexpected outputs.

Using Minicheck's library ¹

Requires the changes from our Pull Request to be merged, see ¹

Please run the following command to install the MiniCheck package and all necessary dependencies.

pip install "minicheck[llm] @ git+https://github.com/Liyan06/MiniCheck.git@main"

Below is a simple use case

from minicheck.minicheck import MiniCheck
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

doc = "A group of students gather in the school library to study for their upcoming final exams."
claim_1 = "The students are preparing for an examination."
claim_2 = "The students are on vacation."

chat_kwargs = {'enable_thinking': False}

scorer = MiniCheck(model_name='resect-ai/veritas-8B-fact-checker-non-thinking-1.0', enable_prefix_caching=False, extra_chat_template_kwargs=chat_kwargs, operating_mode="bespoke", max_new_tokens=1, cache_dir='./ckpts', bypass_model_check=True)
pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2]) # can set `chunk_size=your-specified-value` here, default to 32K chunk size. 

print(pred_label) # [1, 0]
print(raw_prob)   # [0.9465315396494047, 0.008577206810662688]

Test on LLM-AggreFact Benchmark ¹

import pandas as pd
from datasets import load_dataset
from minicheck.minicheck import MiniCheck
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# load 30K test data
df = pd.DataFrame(load_dataset("lytang/LLM-AggreFact")['test'])
docs = df.doc.values
claims = df.claim.values

chat_kwargs = {'enable_thinking': False}

scorer = MiniCheck(model_name='resect-ai/veritas-8B-fact-checker-non-thinking-1.0', enable_prefix_caching=False, extra_chat_template_kwargs=chat_kwargs, operating_mode="bespoke", max_new_tokens=1, cache_dir='./ckpts', bypass_model_check=True)
pred_label, raw_prob, _, _ = scorer.score(docs=docs, claims=claims)

To evaluate the result on the benchmark

from sklearn.metrics import balanced_accuracy_score

df['preds'] = pred_label
result_df = pd.DataFrame(columns=['Dataset', 'BAcc'])
for dataset in df.dataset.unique():
    sub_df = df[df.dataset == dataset]
    bacc = balanced_accuracy_score(sub_df.label, sub_df.preds) * 100
    result_df.loc[len(result_df)] = [dataset, bacc]

result_df.loc[len(result_df)] = ['Average', result_df.BAcc.mean()]
result_df.round(1)

License

This model Veritas-8B-Fact-Checker-Non-Thinking-1.0 is bound by the Apache 2.0 license found at https://choosealicense.com/licenses/apache-2.0. By downloading and using this model you agree to the license terms.

Acknowledgements

Model perfected by Resect Research Labs.

Pull Request to Minicheck's library submitted awaiting review ↩︎

5.3 KiB Raw Blame History