Nessie is Arkova's credential metadata extraction model, fine-tuned from Meta Llama 3.1 8B Instruct for structured extraction of credential metadata from PII-stripped document text.
Model Details
Base model: meta-llama/Meta-Llama-3.1-8B-Instruct
Fine-tuning: Together AI (job ft-b8594db6-80f9)
Training data: 1,903 train + 211 validation examples
Precision: float16
Context length: 32,768 tokens
Training mix: 75% domain-specific + 25% general credential data
Evaluation Results (v5)
Metric
Value
Weighted F1
87.2%
Macro F1
75.7%
Mean Confidence
72.5%
Mean Accuracy
83.5%
Confidence Correlation (r)
0.539
Mean Latency
1,543ms
Per-Type Performance (Top 10)
Type
Weighted F1
Sample Size
FINANCIAL
100.0%
n=2
TRANSCRIPT
100.0%
n=2
RESUME
100.0%
n=2
DEGREE
98.5%
n=11
PATENT
97.1%
n=4
LICENSE
96.6%
n=10
PROFESSIONAL
95.8%
n=7
INSURANCE
93.3%
n=4
LEGAL
92.9%
n=3
CLE
91.1%
n=2
Intended Use
Nessie extracts structured metadata from PII-stripped credential text. Input is pre-processed to remove personally identifiable information before reaching the model.
Important: This model must be used with its trained condensed prompt (~1.5K chars). Using the full extraction prompt (58K chars) causes 0% F1 due to prompt template mismatch.