初始化项目，由ModelHub XC社区提供模型

Model: muthugsubramanian/DocWain-14B-v2-unified Source: Original Platform
2026-05-31 18:08:52 +08:00
commit 90b98b11dc
17 changed files with 152454 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,77 @@
+---
+license: apache-2.0
+language:
+  - en
+tags:
+  - document-intelligence
+  - rag
+  - extraction
+  - enterprise
+  - docwain
+pipeline_tag: text-generation
+base_model: muthugsubramanian/DocWain-14B-v2
+---
+
+# DocWain-14B-v2-unified (FP16)
+
+DocWain is an **enterprise document intelligence agent** built for extraction,
+analysis, comparison, and grounded response generation over user-uploaded
+document profiles. This **unified** variant has identity, capability awareness,
+and behavioural discipline (verbatim quoting, refusal on missing data, currency
+preservation, anti-tailoring) baked into the weights via a focused LoRA SFT
+finetune on synthetic data.
+
+## What's in this release
+
+- **Format:** FP16
+- **Base model:** muthugsubramanian/DocWain-14B-v2 (vision-grafted Qwen3-14B)
+- **Identity:** baked-in — model self-identifies as DocWain regardless of
+  system prompt
+- **Behaviour:** trained to quote verbatim from evidence, say "not specified
+  in the documents" rather than fabricate, preserve currency symbols (₹/£/$),
+  and refuse to add skills/education/experience that aren't in the source
+
+## Capabilities
+
+- Accurate extraction from invoices, contracts, resumes, policies, research
+  papers, and other enterprise document types
+- Document intelligence — summaries, key findings, cross-document
+  relationships, anomaly surfacing
+- Layout and context understanding — tables, charts, multi-page references
+- Grounded response generation with verbatim quoting and explicit
+  "not specified" handling
+- Document generation — structured reports, comparison tables, executive
+  briefs derived from the user's documents
+
+## Training data
+
+Synthetic-only per project policy. The training corpus contains:
+- Identity / persona examples (no customer data)
+- Capability awareness Q&A
+- Synthetic invoices / contracts / resumes / research-paper snippets paired
+  with ideal grounded responses
+- Domain-mismatch refusal examples
+- General-instruction mix-in to preserve breadth
+
+No customer documents, no scraped private data.
+
+## Recommended runtime
+
+| Variant | Runtime | GPU floor |
+|---------|---------|-----------|
+| FP16 | vLLM, transformers | A100 80GB |
+| AWQ INT4 | vLLM `--quantization compressed-tensors` | 16GB+ |
+| GGUF Q5_K_M | Ollama / llama.cpp | 16GB GPU or CPU |
+| GGUF Q4_K_M | Ollama / llama.cpp | 12GB GPU or CPU |
+
+## Prompting
+
+A short system prompt is enough at runtime — identity is in the weights:
+
+```
+You are DocWain — an enterprise document intelligence agent.
+```
+
+For full behaviour (RAG-aware, currency-preserving, anti-tailoring),
+provide your standard DocWain system prompt; the model will respect both
+its baked-in identity and the prompt-specified rules.