Files

ModelHub XC b493537dfc 初始化项目，由ModelHub XC社区提供模型

Model: carsonarkova/nessie-v5-llama-3.1-8b
Source: Original Platform

2026-04-24 15:47:04 +08:00

3.2 KiB

Raw Permalink Blame History

license, base_model, tags, datasets, language, pipeline_tag, model-index

license

base_model

Nessie v5 (Llama 3.1 8B Fine-tune)

Nessie is Arkova's credential metadata extraction model, fine-tuned from Meta Llama 3.1 8B Instruct for structured extraction of credential metadata from PII-stripped document text.

Model Details

Base model: meta-llama/Meta-Llama-3.1-8B-Instruct
Fine-tuning: Together AI (job ft-b8594db6-80f9)
Training data: 1,903 train + 211 validation examples
Precision: float16
Context length: 32,768 tokens
Training mix: 75% domain-specific + 25% general credential data

Evaluation Results (v5)

Metric	Value
Weighted F1	87.2%
Macro F1	75.7%
Mean Confidence	72.5%
Mean Accuracy	83.5%
Confidence Correlation (r)	0.539
Mean Latency	1,543ms

Per-Type Performance (Top 10)

Type	Weighted F1	Sample Size
FINANCIAL	100.0%	n=2
TRANSCRIPT	100.0%	n=2
RESUME	100.0%	n=2
DEGREE	98.5%	n=11
PATENT	97.1%	n=4
LICENSE	96.6%	n=10
PROFESSIONAL	95.8%	n=7
INSURANCE	93.3%	n=4
LEGAL	92.9%	n=3
CLE	91.1%	n=2

Intended Use

Nessie extracts structured metadata from PII-stripped credential text. Input is pre-processed to remove personally identifiable information before reaching the model.

Important: This model must be used with its trained condensed prompt (~1.5K chars). Using the full extraction prompt (58K chars) causes 0% F1 due to prompt template mismatch.

Credential Types Supported

DEGREE, LICENSE, CERTIFICATE, BADGE, SEC_FILING, LEGAL, REGULATION, PATENT, PUBLICATION, ATTESTATION, INSURANCE, FINANCIAL, MILITARY, CLE, RESUME, MEDICAL, IDENTITY, TRANSCRIPT, PROFESSIONAL, OTHER

Domain-Specific Adapters

Nessie v5 includes domain-specific LoRA adapters trained on specialized corpora:

SEC (45K examples): SEC filings, financial disclosures
Academic (45K examples): Degrees, transcripts, publications
Legal (13K examples): Legal documents, bar admissions, CLE
Regulatory (13K examples): Licenses, regulations, compliance

Limitations

Only processes PII-stripped text (by design)
Small sample sizes for some credential types (FINANCIAL, TRANSCRIPT, RESUME at n=2)
fraudSignals field has 0% F1 (known limitation, under improvement)
Confidence calibration ECE of 11% (recalibrated via piecewise linear function)

Citation

@software{nessie-v5,
  title={Nessie v5: Credential Metadata Extraction Model},
  author={Arkova},
  year={2026},
  url={https://arkova.ai}
}

License

This model is released under the Llama 3.1 Community License. See META's license for details.

3.2 KiB Raw Permalink Blame History