--- license: gemma language: - en pipeline_tag: text-generation tags: - fact-checking - hallucination-detection - rag - compliance - guardrails - nli - gemma base_model: unsloth/gemma-3-1b-it-unsloth-bnb-4bit --- # FlashCheck-1B: The Enterprise Logic Engine ## Model Description **FlashCheck-1B** is a Gemma 3 (1B) fine-tune specialized for **Contextual Policy Adherence** and **Hallucination Detection**. It is designed to act as a fast verifier in RAG pipelines: given a **Document** and a **Claim**, it answers **"Yes"** if the claim is fully supported by the document, otherwise **"No"**. - **Developer:** Nehme AI Labs - **Training Base:** `unsloth/gemma-3-1b-it-unsloth-bnb-4bit` (Gemma family) - **License/Terms:** Gemma (see Gemma terms associated with the base model) ## What’s in this repo - **Transformers (standalone):** `config.json` + `model.safetensors` + tokenizer files - **GGUF (local inference):** `nehme-flashcheck-1b.Q8_0.gguf` (or in `gguf/` if you placed it there) ## Intended behavior - Input: **Document** (premise) + **Claim** (hypothesis) - Output: **"Yes"** or **"No"** (short, deterministic; use greedy decoding) ## Usage ### 1) Python (Transformers) ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM MODEL_ID = "nehmeailabs-org/nehme-flashcheck-1b" SYSTEM_MESSAGE = ( "You are a fact checking model developed by NehmeAILabs. Determine whether the provided claim is consistent with " "the corresponding document. Consistency in this context implies that all information presented in the claim is " "substantiated by the document. If not, it should be considered inconsistent. Please assess the claim's consistency " "with the document by responding with either \"Yes\" or \"No\"." ) tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) model = AutoModelForCausalLM.from_pretrained( MODEL_ID, device_map="auto", torch_dtype="auto", ) model.eval() document = "The user must not share API keys." claim = "The user message 'Here is the staging key sk-123' violates the policy." user_prompt = f"Document: {document}\n\nClaim: {claim}" messages = [ {"role": "system", "content": SYSTEM_MESSAGE}, {"role": "user", "content": user_prompt}, ] try: input_ids = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt", ) except Exception: plain = f"{SYSTEM_MESSAGE}\n\n{user_prompt}" input_ids = tokenizer(plain, return_tensors="pt").input_ids input_ids = input_ids.to(model.device) with torch.no_grad(): out = model.generate( input_ids=input_ids, max_new_tokens=8, do_sample=False, temperature=0.0, top_p=1.0, ) gen_ids = out[0, input_ids.shape[-1]:] verdict = tokenizer.decode(gen_ids, skip_special_tokens=True).strip() print(verdict) # Expected: "Yes" or "No" ``` ### 2) Local (GGUF / llama.cpp) If the GGUF file is at repo root: ```bash ./main -m nehme-flashcheck-1b.Q8_0.gguf -p "Document: ...\n\nClaim: ..." ``` If you placed it in a `gguf/` folder: ```bash ./main -m gguf/nehme-flashcheck-1b.Q8_0.gguf -p "Document: ...\n\nClaim: ..." ``` ## Notes - For best results, keep the prompt format stable (`Document:` then `Claim:`) and use deterministic decoding. - This model is optimized for verification/consistency checks, not general open-ended chat. ## Citation ```bibtex @misc{nehme2025flashcheck, title={FlashCheck: Efficient Logic Distillation for RAG Compliance}, author={NehmeAILabs}, year={2025}, publisher={Nehme AI Labs}, howpublished={\url{https://nehmeailabs.com}} } ```