初始化项目，由ModelHub XC社区提供模型

Model: carsonarkova/nessie-v5-llama-3.1-8b Source: Original Platform
2026-04-24 15:47:04 +08:00
commit b493537dfc
12 changed files with 413144 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,108 @@
+---
+license: llama3.1
+base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
+tags:
+  - credential-verification
+  - document-extraction
+  - fine-tuned
+  - arkova
+  - nessie
+datasets:
+  - custom
+language:
+  - en
+pipeline_tag: text-generation
+model-index:
+  - name: nessie-v5-llama-3.1-8b
+    results:
+      - task:
+          type: text-generation
+          name: Credential Metadata Extraction
+        metrics:
+          - type: weighted-f1
+            value: 87.2
+            name: Weighted F1
+          - type: macro-f1
+            value: 75.7
+            name: Macro F1
+---
+
+# Nessie v5 (Llama 3.1 8B Fine-tune)
+
+**Nessie** is Arkova's credential metadata extraction model, fine-tuned from Meta Llama 3.1 8B Instruct for structured extraction of credential metadata from PII-stripped document text.
+
+## Model Details
+
+- **Base model:** meta-llama/Meta-Llama-3.1-8B-Instruct
+- **Fine-tuning:** Together AI (job ft-b8594db6-80f9)
+- **Training data:** 1,903 train + 211 validation examples
+- **Precision:** float16
+- **Context length:** 32,768 tokens
+- **Training mix:** 75% domain-specific + 25% general credential data
+
+## Evaluation Results (v5)
+
+| Metric | Value |
+|--------|-------|
+| Weighted F1 | 87.2% |
+| Macro F1 | 75.7% |
+| Mean Confidence | 72.5% |
+| Mean Accuracy | 83.5% |
+| Confidence Correlation (r) | 0.539 |
+| Mean Latency | 1,543ms |
+
+### Per-Type Performance (Top 10)
+
+| Type | Weighted F1 | Sample Size |
+|------|------------|-------------|
+| FINANCIAL | 100.0% | n=2 |
+| TRANSCRIPT | 100.0% | n=2 |
+| RESUME | 100.0% | n=2 |
+| DEGREE | 98.5% | n=11 |
+| PATENT | 97.1% | n=4 |
+| LICENSE | 96.6% | n=10 |
+| PROFESSIONAL | 95.8% | n=7 |
+| INSURANCE | 93.3% | n=4 |
+| LEGAL | 92.9% | n=3 |
+| CLE | 91.1% | n=2 |
+
+## Intended Use
+
+Nessie extracts structured metadata from PII-stripped credential text. Input is pre-processed to remove personally identifiable information before reaching the model.
+
+**Important:** This model must be used with its trained condensed prompt (~1.5K chars). Using the full extraction prompt (58K chars) causes 0% F1 due to prompt template mismatch.
+
+## Credential Types Supported
+
+DEGREE, LICENSE, CERTIFICATE, BADGE, SEC_FILING, LEGAL, REGULATION, PATENT, PUBLICATION, ATTESTATION, INSURANCE, FINANCIAL, MILITARY, CLE, RESUME, MEDICAL, IDENTITY, TRANSCRIPT, PROFESSIONAL, OTHER
+
+## Domain-Specific Adapters
+
+Nessie v5 includes domain-specific LoRA adapters trained on specialized corpora:
+
+- **SEC** (45K examples): SEC filings, financial disclosures
+- **Academic** (45K examples): Degrees, transcripts, publications
+- **Legal** (13K examples): Legal documents, bar admissions, CLE
+- **Regulatory** (13K examples): Licenses, regulations, compliance
+
+## Limitations
+
+- Only processes PII-stripped text (by design)
+- Small sample sizes for some credential types (FINANCIAL, TRANSCRIPT, RESUME at n=2)
+- fraudSignals field has 0% F1 (known limitation, under improvement)
+- Confidence calibration ECE of 11% (recalibrated via piecewise linear function)
+
+## Citation
+
+```
+@software{nessie-v5,
+  title={Nessie v5: Credential Metadata Extraction Model},
+  author={Arkova},
+  year={2026},
+  url={https://arkova.ai}
+}
+```
+
+## License
+
+This model is released under the Llama 3.1 Community License. See META's license for details.