nessie-v5-llama-3.1-8b/README.md

---
license: llama3.1
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
tags:
  - credential-verification
  - document-extraction
  - fine-tuned
  - arkova
  - nessie
datasets:
  - custom
language:
  - en
pipeline_tag: text-generation
model-index:
  - name: nessie-v5-llama-3.1-8b
    results:
      - task:
          type: text-generation
          name: Credential Metadata Extraction
        metrics:
          - type: weighted-f1
            value: 87.2
            name: Weighted F1
          - type: macro-f1
            value: 75.7
            name: Macro F1
---

# Nessie v5 (Llama 3.1 8B Fine-tune)

**Nessie** is Arkova's credential metadata extraction model, fine-tuned from Meta Llama 3.1 8B Instruct for structured extraction of credential metadata from PII-stripped document text.

## Model Details

- **Base model:** meta-llama/Meta-Llama-3.1-8B-Instruct
- **Fine-tuning:** Together AI (job ft-b8594db6-80f9)
- **Training data:** 1,903 train + 211 validation examples
- **Precision:** float16
- **Context length:** 32,768 tokens
- **Training mix:** 75% domain-specific + 25% general credential data

## Evaluation Results (v5)

| Metric | Value |
|--------|-------|
| Weighted F1 | 87.2% |
| Macro F1 | 75.7% |
| Mean Confidence | 72.5% |
| Mean Accuracy | 83.5% |
| Confidence Correlation (r) | 0.539 |
| Mean Latency | 1,543ms |

### Per-Type Performance (Top 10)

| Type | Weighted F1 | Sample Size |
|------|------------|-------------|
| FINANCIAL | 100.0% | n=2 |
| TRANSCRIPT | 100.0% | n=2 |
| RESUME | 100.0% | n=2 |
| DEGREE | 98.5% | n=11 |
| PATENT | 97.1% | n=4 |
| LICENSE | 96.6% | n=10 |
| PROFESSIONAL | 95.8% | n=7 |
| INSURANCE | 93.3% | n=4 |
| LEGAL | 92.9% | n=3 |
| CLE | 91.1% | n=2 |

## Intended Use

Nessie extracts structured metadata from PII-stripped credential text. Input is pre-processed to remove personally identifiable information before reaching the model.

**Important:** This model must be used with its trained condensed prompt (~1.5K chars). Using the full extraction prompt (58K chars) causes 0% F1 due to prompt template mismatch.

## Credential Types Supported

DEGREE, LICENSE, CERTIFICATE, BADGE, SEC_FILING, LEGAL, REGULATION, PATENT, PUBLICATION, ATTESTATION, INSURANCE, FINANCIAL, MILITARY, CLE, RESUME, MEDICAL, IDENTITY, TRANSCRIPT, PROFESSIONAL, OTHER

## Domain-Specific Adapters

Nessie v5 includes domain-specific LoRA adapters trained on specialized corpora:

- **SEC** (45K examples): SEC filings, financial disclosures
- **Academic** (45K examples): Degrees, transcripts, publications
- **Legal** (13K examples): Legal documents, bar admissions, CLE
- **Regulatory** (13K examples): Licenses, regulations, compliance

## Limitations

- Only processes PII-stripped text (by design)
- Small sample sizes for some credential types (FINANCIAL, TRANSCRIPT, RESUME at n=2)
- fraudSignals field has 0% F1 (known limitation, under improvement)
- Confidence calibration ECE of 11% (recalibrated via piecewise linear function)

## Citation

```
@software{nessie-v5,
  title={Nessie v5: Credential Metadata Extraction Model},
  author={Arkova},
  year={2026},
  url={https://arkova.ai}
}
```

## License

This model is released under the Llama 3.1 Community License. See META's license for details.
初始化项目，由ModelHub XC社区提供模型 Model: carsonarkova/nessie-v5-llama-3.1-8b Source: Original Platform 2026-04-24 15:47:04 +08:00			`---`
			`license: llama3.1`
			`base_model: meta-llama/Meta-Llama-3.1-8B-Instruct`
			`tags:`
			`- credential-verification`
			`- document-extraction`
			`- fine-tuned`
			`- arkova`
			`- nessie`
			`datasets:`
			`- custom`
			`language:`
			`- en`
			`pipeline_tag: text-generation`
			`model-index:`
			`- name: nessie-v5-llama-3.1-8b`
			`results:`
			`- task:`
			`type: text-generation`
			`name: Credential Metadata Extraction`
			`metrics:`
			`- type: weighted-f1`
			`value: 87.2`
			`name: Weighted F1`
			`- type: macro-f1`
			`value: 75.7`
			`name: Macro F1`
			`---`

			`# Nessie v5 (Llama 3.1 8B Fine-tune)`

			`Nessie is Arkova's credential metadata extraction model, fine-tuned from Meta Llama 3.1 8B Instruct for structured extraction of credential metadata from PII-stripped document text.`

			`## Model Details`

			`- Base model: meta-llama/Meta-Llama-3.1-8B-Instruct`
			`- Fine-tuning: Together AI (job ft-b8594db6-80f9)`
			`- Training data: 1,903 train + 211 validation examples`
			`- Precision: float16`
			`- Context length: 32,768 tokens`
			`- Training mix: 75% domain-specific + 25% general credential data`

			`## Evaluation Results (v5)`

			`\| Metric \| Value \|`
			`\|--------\|-------\|`
			`\| Weighted F1 \| 87.2% \|`
			`\| Macro F1 \| 75.7% \|`
			`\| Mean Confidence \| 72.5% \|`
			`\| Mean Accuracy \| 83.5% \|`
			`\| Confidence Correlation (r) \| 0.539 \|`
			`\| Mean Latency \| 1,543ms \|`

			`### Per-Type Performance (Top 10)`

			`\| Type \| Weighted F1 \| Sample Size \|`
			`\|------\|------------\|-------------\|`
			`\| FINANCIAL \| 100.0% \| n=2 \|`
			`\| TRANSCRIPT \| 100.0% \| n=2 \|`
			`\| RESUME \| 100.0% \| n=2 \|`
			`\| DEGREE \| 98.5% \| n=11 \|`
			`\| PATENT \| 97.1% \| n=4 \|`
			`\| LICENSE \| 96.6% \| n=10 \|`
			`\| PROFESSIONAL \| 95.8% \| n=7 \|`
			`\| INSURANCE \| 93.3% \| n=4 \|`
			`\| LEGAL \| 92.9% \| n=3 \|`
			`\| CLE \| 91.1% \| n=2 \|`

			`## Intended Use`

			`Nessie extracts structured metadata from PII-stripped credential text. Input is pre-processed to remove personally identifiable information before reaching the model.`

			`Important: This model must be used with its trained condensed prompt (~1.5K chars). Using the full extraction prompt (58K chars) causes 0% F1 due to prompt template mismatch.`

			`## Credential Types Supported`

			`DEGREE, LICENSE, CERTIFICATE, BADGE, SEC_FILING, LEGAL, REGULATION, PATENT, PUBLICATION, ATTESTATION, INSURANCE, FINANCIAL, MILITARY, CLE, RESUME, MEDICAL, IDENTITY, TRANSCRIPT, PROFESSIONAL, OTHER`

			`## Domain-Specific Adapters`

			`Nessie v5 includes domain-specific LoRA adapters trained on specialized corpora:`

			`- SEC (45K examples): SEC filings, financial disclosures`
			`- Academic (45K examples): Degrees, transcripts, publications`
			`- Legal (13K examples): Legal documents, bar admissions, CLE`
			`- Regulatory (13K examples): Licenses, regulations, compliance`

			`## Limitations`

			`- Only processes PII-stripped text (by design)`
			`- Small sample sizes for some credential types (FINANCIAL, TRANSCRIPT, RESUME at n=2)`
			`- fraudSignals field has 0% F1 (known limitation, under improvement)`
			`- Confidence calibration ECE of 11% (recalibrated via piecewise linear function)`

			`## Citation`

			```
			`@software{nessie-v5,`
			`title={Nessie v5: Credential Metadata Extraction Model},`
			`author={Arkova},`
			`year={2026},`
			`url={https://arkova.ai}`
			`}`
			```

			`## License`

			`This model is released under the Llama 3.1 Community License. See META's license for details.`