muthugsubramanian/DocWain-14B-v2-unified

Go to file

ModelHub XC 90b98b11dc 初始化项目，由ModelHub XC社区提供模型

Model: muthugsubramanian/DocWain-14B-v2-unified
Source: Original Platform

2026-05-31 18:08:52 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

model-00001-of-00006.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

model-00002-of-00006.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

model-00003-of-00006.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

model-00004-of-00006.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

model-00005-of-00006.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

model-00006-of-00006.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-05-31 18:08:52 +08:00

README.md

license, language, tags, pipeline_tag, base_model

license

language

DocWain-14B-v2-unified (FP16)

DocWain is an enterprise document intelligence agent built for extraction, analysis, comparison, and grounded response generation over user-uploaded document profiles. This unified variant has identity, capability awareness, and behavioural discipline (verbatim quoting, refusal on missing data, currency preservation, anti-tailoring) baked into the weights via a focused LoRA SFT finetune on synthetic data.

What's in this release

Format: FP16
Base model: muthugsubramanian/DocWain-14B-v2 (vision-grafted Qwen3-14B)
Identity: baked-in — model self-identifies as DocWain regardless of system prompt
Behaviour: trained to quote verbatim from evidence, say "not specified in the documents" rather than fabricate, preserve currency symbols (₹/£/$), and refuse to add skills/education/experience that aren't in the source

Capabilities

Accurate extraction from invoices, contracts, resumes, policies, research papers, and other enterprise document types
Document intelligence — summaries, key findings, cross-document relationships, anomaly surfacing
Layout and context understanding — tables, charts, multi-page references
Grounded response generation with verbatim quoting and explicit "not specified" handling
Document generation — structured reports, comparison tables, executive briefs derived from the user's documents

Training data

Synthetic-only per project policy. The training corpus contains:

Identity / persona examples (no customer data)
Capability awareness Q&A
Synthetic invoices / contracts / resumes / research-paper snippets paired with ideal grounded responses
Domain-mismatch refusal examples
General-instruction mix-in to preserve breadth

No customer documents, no scraped private data.

Recommended runtime

Variant	Runtime	GPU floor
FP16	vLLM, transformers	A100 80GB
AWQ INT4	vLLM `--quantization compressed-tensors`	16GB+
GGUF Q5_K_M	Ollama / llama.cpp	16GB GPU or CPU
GGUF Q4_K_M	Ollama / llama.cpp	12GB GPU or CPU

Prompting

A short system prompt is enough at runtime — identity is in the weights:

You are DocWain — an enterprise document intelligence agent.

For full behaviour (RAG-aware, currency-preserving, anti-tailoring), provide your standard DocWain system prompt; the model will respect both its baked-in identity and the prompt-specified rules.