--- license: llama3.1 base_model: meta-llama/Meta-Llama-3.1-8B-Instruct tags: - credential-verification - document-extraction - fine-tuned - arkova - nessie datasets: - custom language: - en pipeline_tag: text-generation model-index: - name: nessie-v5-llama-3.1-8b results: - task: type: text-generation name: Credential Metadata Extraction metrics: - type: weighted-f1 value: 87.2 name: Weighted F1 - type: macro-f1 value: 75.7 name: Macro F1 --- # Nessie v5 (Llama 3.1 8B Fine-tune) **Nessie** is Arkova's credential metadata extraction model, fine-tuned from Meta Llama 3.1 8B Instruct for structured extraction of credential metadata from PII-stripped document text. ## Model Details - **Base model:** meta-llama/Meta-Llama-3.1-8B-Instruct - **Fine-tuning:** Together AI (job ft-b8594db6-80f9) - **Training data:** 1,903 train + 211 validation examples - **Precision:** float16 - **Context length:** 32,768 tokens - **Training mix:** 75% domain-specific + 25% general credential data ## Evaluation Results (v5) | Metric | Value | |--------|-------| | Weighted F1 | 87.2% | | Macro F1 | 75.7% | | Mean Confidence | 72.5% | | Mean Accuracy | 83.5% | | Confidence Correlation (r) | 0.539 | | Mean Latency | 1,543ms | ### Per-Type Performance (Top 10) | Type | Weighted F1 | Sample Size | |------|------------|-------------| | FINANCIAL | 100.0% | n=2 | | TRANSCRIPT | 100.0% | n=2 | | RESUME | 100.0% | n=2 | | DEGREE | 98.5% | n=11 | | PATENT | 97.1% | n=4 | | LICENSE | 96.6% | n=10 | | PROFESSIONAL | 95.8% | n=7 | | INSURANCE | 93.3% | n=4 | | LEGAL | 92.9% | n=3 | | CLE | 91.1% | n=2 | ## Intended Use Nessie extracts structured metadata from PII-stripped credential text. Input is pre-processed to remove personally identifiable information before reaching the model. **Important:** This model must be used with its trained condensed prompt (~1.5K chars). Using the full extraction prompt (58K chars) causes 0% F1 due to prompt template mismatch. ## Credential Types Supported DEGREE, LICENSE, CERTIFICATE, BADGE, SEC_FILING, LEGAL, REGULATION, PATENT, PUBLICATION, ATTESTATION, INSURANCE, FINANCIAL, MILITARY, CLE, RESUME, MEDICAL, IDENTITY, TRANSCRIPT, PROFESSIONAL, OTHER ## Domain-Specific Adapters Nessie v5 includes domain-specific LoRA adapters trained on specialized corpora: - **SEC** (45K examples): SEC filings, financial disclosures - **Academic** (45K examples): Degrees, transcripts, publications - **Legal** (13K examples): Legal documents, bar admissions, CLE - **Regulatory** (13K examples): Licenses, regulations, compliance ## Limitations - Only processes PII-stripped text (by design) - Small sample sizes for some credential types (FINANCIAL, TRANSCRIPT, RESUME at n=2) - fraudSignals field has 0% F1 (known limitation, under improvement) - Confidence calibration ECE of 11% (recalibrated via piecewise linear function) ## Citation ``` @software{nessie-v5, title={Nessie v5: Credential Metadata Extraction Model}, author={Arkova}, year={2026}, url={https://arkova.ai} } ``` ## License This model is released under the Llama 3.1 Community License. See META's license for details.