初始化项目，由ModelHub XC社区提供模型

Model: reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking Source: Original Platform
2026-06-13 10:54:17 +08:00
commit 8f951141f0
11 changed files with 5672 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,36 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,233 @@
+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- qwen3
+- sft
+- trl
+- dualmind
+- knowledge-distillation
+- thinking
+- opus
+- self-critique
+- convergent-intelligence
+- convergentintel
+- edge
+- distillation
+base_model:
+- reaperdoesntknow/DualMinded-Qwen3-1.7B
+datasets:
+- nohurry/Opus-4.6-Reasoning-3000x-filtered
+- zai-org/LongWriter-6k
+language:
+- en
+---
+
+# Dualmind-Qwen-1.7B-Thinking
+
+**Claude Opus 4.6 Reasoning Traces → 1.7B via DualMind SFT**
+
+*Convergent Intelligence LLC: Research Division*
+
+---
+
+## What This Is
+
+A 1.7B model trained on **2.5M+ tokens of Claude Opus 4.6 reasoning traces** using the DualMind SFT methodology. The training data comes from [Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) — a curated dataset of extended reasoning chains from Anthropic's most capable model, with refusals removed.
+
+This is the **Opus variant** of the DualMind family. Where the base [DualMind](https://huggingface.co/reaperdoesntknow/DualMind) model was trained on LogicInference data, this model absorbs the reasoning patterns of Claude Opus 4.6 — longer chains, more nuanced self-correction, and richer deliberative structure. The Opus teacher produces qualitatively different reasoning than synthetic logic datasets: it backtracks, hedges, reconsiders, and synthesizes in ways that reflect genuine uncertainty navigation rather than pattern completion.
+
+The base model is [Disctil-Qwen3-1.7B](https://huggingface.co/reaperdoesntknow/Disctil-Qwen3-1.7B) — already DISC-refined and sitting in the middle of the DistilQwen distillation chain — giving it a strong structural foundation before the Opus reasoning signal is applied.
+
+## Architecture
+
+| Parameter | Value |
+|-----------|-------|
+| Architecture | Qwen3ForCausalLM |
+| Parameters | ~2.03B (1.7B effective) |
+| Hidden Size | 2048 |
+| Layers | 28 |
+| Attention Heads | 16 (Q) / 8 (KV) — GQA |
+| Intermediate | 6144 |
+| Head Dimension | 128 |
+| Context Length | 40,960 tokens (max position) |
+| Vocabulary | 151,936 |
+| Precision | BF16 |
+| Activation | SiLU |
+
+## Training
+
+| Parameter | Value |
+|-----------|-------|
+| Base Model | [Disctil-Qwen3-1.7B](https://huggingface.co/reaperdoesntknow/Disctil-Qwen3-1.7B) |
+| Dataset | [Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) |
+| Additional Tokens | ~2.5M |
+| Max Sequence Length | 4,096 |
+| Total Steps | 512 |
+| Epochs | ~7.4 |
+| Method | SFT (TRL SFTTrainer) |
+| Precision | BF16 |
+| Hardware | NVIDIA H100 |
+
+### Training Dynamics
+
+| Metric | Start | End |
+|--------|-------|-----|
+| Training Loss | 1.744 | 1.455 |
+| Eval Loss | — | 1.406 |
+| Token Accuracy | 61.0% | 67.8% |
+
+The loss curve shows clean convergence across 7.4 epochs with no signs of overfitting — eval loss (1.406) remains below final training loss (1.455). The 6.8 percentage point gain in token accuracy reflects genuine absorption of the Opus reasoning structure, not memorization.
+
+### Why Opus Traces
+
+The Opus-4.6-Reasoning dataset captures something that synthetic datasets don't: the way a frontier model navigates genuine uncertainty. Opus doesn't just solve problems — it reasons about its own confidence, backtracks when a line of thought weakens, and synthesizes across multiple attempted approaches. When you distill from these traces, the student doesn't just learn to produce correct answers. It learns the **shape of deliberation**.
+
+This is the DualMind thesis in practice: the cognitive loop (explore → examine → respond) isn't an architectural trick. It's a training signal. When the teacher naturally exhibits multi-phase reasoning, the student absorbs that structure through standard SFT.
+
+## Usage
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model = AutoModelForCausalLM.from_pretrained(
+    "reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking",
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(
+    "reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking"
+)
+
+messages = [
+    {"role": "user", "content": "What happens to information that falls into a black hole? Walk me through the paradox."}
+]
+
+text = tokenizer.apply_chat_template(
+    messages, tokenize=False, add_generation_prompt=True
+)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+
+output = model.generate(
+    **inputs,
+    max_new_tokens=2048,
+    do_sample=True,
+    top_p=0.9,
+    temperature=0.7,
+    repetition_penalty=1.15
+)
+
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+```
+
+### Generation Tips
+
+- **Temperature 0.6–0.8** — the Opus reasoning traces have natural variance in them. Don't flatten it with low temperature.
+- **Repetition penalty 1.1–1.2** — prevents looping during extended reasoning chains.
+- **Max tokens 1024–2048** — trained at 4096 max seq, so it can go long. The Opus signal rewards longer generation windows.
+- The model may produce multi-phase reasoning naturally (exploring, then reconsidering, then concluding). This is the intended behavior — the DualMind cognitive loop emerging from the training signal.
+
+## Model Lineage
+
+```
+Qwen3-1.7B (base)
+  → DiStil-Qwen3-1.7B-uncensored (uncensored SFT)
+    → Disctil-Qwen3-1.7B (DISC refinement)
+      → Dualmind-Qwen-1.7B-Thinking ← you are here
+           ↑
+    Opus 4.6 reasoning traces (2.5M tokens, DualMind SFT)
+```
+
+### DualMind Family Comparison
+
+| Model | Training Signal | Character |
+|-------|----------------|-----------|
+| [DualMind](https://huggingface.co/reaperdoesntknow/DualMind) | LogicInference | Structured logical deduction |
+| **Dualmind-Qwen-1.7B-Thinking** | **Opus 4.6 Reasoning** | **Extended deliberation, self-correction** |
+| [TopologicalQwen](https://huggingface.co/reaperdoesntknow/TopologicalQwen) | 30B-Thinking (TKD) | Topology-aware physics CoT |
+
+Same methodology, different teachers, different capabilities. The LogicInference variant is more mechanical. The Opus variant is more deliberative. TopologicalQwen is the full TKD pipeline with BV decomposition. They're complementary — different facets of the same cognitive architecture.
+
+## DualMind Collection
+
+| Model | Description |
+|-------|-------------|
+| [DualMind](https://huggingface.co/reaperdoesntknow/DualMind) | LogicInference-trained. Explore→Examine→Response cognitive loop. |
+| [DualMind_Methodology](https://huggingface.co/reaperdoesntknow/DualMind_Methodolgy) | Paper: Three Teachers to Dual Cognition (DOI: 10.57967/hf/8184) |
+| **[Dualmind-Qwen-1.7B-Thinking](https://huggingface.co/reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking)** | **← this model. Opus 4.6 reasoning variant.** |
+| [DualMind-GGUF](https://huggingface.co/reaperdoesntknow/DualMind-GGUF) | LogicInference variant quantized for edge deployment. |
+
+Full collection: [DualMind on HuggingFace](https://huggingface.co/collections/reaperdoesntknow/dualmind-69c93f888c6e79ecc69cf41e)
+
+## Papers
+
+- **[Structure Over Scale: Proof-Weighted Knowledge Distillation](https://doi.org/10.57967/hf/8165)** — DOI: 10.57967/hf/8165. The DistilQwen methodology paper.
+- **[Three Teachers to Dual Cognition](https://doi.org/10.57967/hf/8184)** — DOI: 10.57967/hf/8184. The DualMind extension: ghost imprinting and multi-teacher convergence.
+
+## License
+
+Apache 2.0
+
+
+## Mathematical Foundations: Discrepancy Calculus (DISC)
+
+This model's training pipeline is grounded in Discrepancy Calculus — a measure-theoretic framework that treats singularities as primary structure rather than pathology. Full theory: *"On the Formal Analysis of Discrepancy Calculus"* (Colca, 2026; Convergent Intelligence LLC: Research Division).
+
+**The Core Operator:**
+
+$$Df(x) = \lim_{\varepsilon \downarrow 0} \frac{1}{\varepsilon} \int_x^{x+\varepsilon} \frac{|f(t) - f(x)|}{|t - x|}\, dt$$
+
+For smooth $f$: $Df(x) = |f'(x)|$. For rough $f$: $D$ localizes irregularity to null sets while preserving integral structure.
+
+**The Mesh Fundamental Identity** — every BV function decomposes as:
+
+$$f(b) - f(a) = \underbrace{\int_a^b f'(x)\,dx}_{\text{smooth (AC)}} + \underbrace{\sum_{x \in J_f} \Delta f(x)}_{\text{jumps}} + \underbrace{D^c f(I)}_{\text{Cantor drift}}$$
+
+Standard knowledge distillation captures only term 1. Topological Knowledge Distillation (TKD) preserves all three by treating the teacher's output distribution as a BV function and computing discrepancy energy, jump sets, and gap energy density before training begins.
+
+## Citation
+
+```bibtex
+@misc{colca2026dualmind,
+  title={Three Teachers to Dual Cognition: From Knowledge Distillation to Emergent Reasoning},
+  author={Colca, Roy},
+  year={2026},
+  doi={10.57967/hf/8184},
+  publisher={Convergent Intelligence LLC: Research Division}
+}
+```
+
+---
+
+*Convergent Intelligence LLC: Research Division — 49 models, 22,598+ downloads across the portfolio.*
+*[Full portfolio](https://huggingface.co/reaperdoesntknow) | [DualMind Collection](https://huggingface.co/collections/reaperdoesntknow/dualmind-69c93f888c6e79ecc69cf41e) | [DistilQwen Collection](https://huggingface.co/collections/reaperdoesntknow/distilqwen-69bf40ec669117e3f069ef1c)*
+
+---
+
+## Convergent Intelligence Portfolio
+
+*Part of the [DualMind Series](https://huggingface.co/collections/reaperdoesntknow/dualmind-69c93f888c6e79ecc69cf41e) by [Convergent Intelligence LLC: Research Division](https://huggingface.co/reaperdoesntknow)*
+
+### DualMind Family
+
+| Model | Format | Description |
+|-------|--------|-------------|
+| [DualMind](https://huggingface.co/reaperdoesntknow/DualMind) | BF16 | LogicInference-trained. Explore→Examine→Response loop. |
+| [DualMinded-Qwen3-1.7B](https://huggingface.co/reaperdoesntknow/DualMinded-Qwen3-1.7B) | BF16 | Opus 4.6 reasoning traces. Higher quality splits. |
+| [Dualmind-Qwen-1.7B-Thinking](https://huggingface.co/reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking) | BF16 | Thinking-teacher variant with extended deliberation. |
+| [DualMind-GGUF](https://huggingface.co/reaperdoesntknow/DualMind-GGUF) | GGUF | Quantized LogicInference variant. CPU/6GB GPU. |
+| [DualMinded-Qwen3-1.7B-GGUF](https://huggingface.co/reaperdoesntknow/DualMinded-Qwen3-1.7B-GGUF) | GGUF | Quantized Opus variant. Ollama ready. |
+
+### Papers
+
+| Paper | DOI |
+|-------|-----|
+| [Structure Over Scale](https://huggingface.co/reaperdoesntknow/Structure-Over-Scale) | 10.57967/hf/8165 |
+| [Three Teachers to Dual Cognition](https://huggingface.co/reaperdoesntknow/DualMind_Methodolgy) | 10.57967/hf/8184 |
+| [Discrepancy Calculus](https://huggingface.co/reaperdoesntknow/Discrepancy_Calculus) | 10.57967/hf/8194 |
+
+---
+
+*Last updated: 2026-03-31 by Convergent Intelligence LLC: Research Division*
+<!-- cix-keeper-ts:2026-06-12T13:15:44Z -->
--- a/chat_template.jinja
+++ b/chat_template.jinja
@@ -0,0 +1,89 @@
+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0].role == 'system' %}
+        {{- messages[0].content + '\n\n' }}
+    {%- endif %}
+    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
+        {%- set ns.multi_step_tool = false %}
+        {%- set ns.last_query_index = index %}
+    {%- endif %}
+{%- endfor %}
+{%- for message in messages %}
+    {%- if message.content is string %}
+        {%- set content = message.content %}
+    {%- else %}
+        {%- set content = '' %}
+    {%- endif %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {%- if loop.last or (not loop.last and reasoning_content) %}
+                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
+            {%- else %}
+                {{- '<|im_start|>' + message.role + '\n' + content }}
+            {%- endif %}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if (loop.first and content) or (not loop.first) %}
+                    {{- '\n' }}
+                {%- endif %}
+                {%- if tool_call.function %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '<tool_call>\n{"name": "' }}
+                {{- tool_call.name }}
+                {{- '", "arguments": ' }}
+                {%- if tool_call.arguments is string %}
+                    {{- tool_call.arguments }}
+                {%- else %}
+                    {{- tool_call.arguments | tojson }}
+                {%- endif %}
+                {{- '}\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is false %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,63 @@
+{
+  "architectures": [
+    "Qwen3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": null,
+  "dtype": "bfloat16",
+  "eos_token_id": 151645,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 6144,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 40960,
+  "max_window_layers": 28,
+  "model_type": "qwen3",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 8,
+  "pad_token_id": 151643,
+  "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "rope_theta": 1000000,
+    "rope_type": "default"
+  },
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "transformers_version": "5.0.0",
+  "use_cache": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
--- a/events.out.tfevents.1774855351.0e755ff15ec0.1023.2
+++ b/events.out.tfevents.1774855351.0e755ff15ec0.1023.2
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:754fa573c8076f901c055a875d3ec38572c33c6c1bb1341cae32f40b32310436
+size 202356
--- a/events.out.tfevents.1774858526.0e755ff15ec0.15561.0
+++ b/events.out.tfevents.1774858526.0e755ff15ec0.15561.0
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:c0230efb71b3943a2b6a6f1ca78e937d3ccf451e65eb4e8e079dc482ecc730d7
+size 54371
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,12 @@
+{
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "pad_token_id": 151643,
+  "temperature": 0.6,
+  "top_k": 20,
+  "top_p": 0.95,
+  "transformers_version": "5.0.0"
+}
--- a/model.safetensors
+++ b/model.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:ef9c37a37d926124140a8a543c3aa52b9e2da03a3d00e17e50425fa20a20c4ed
+size 4063515640
--- a/tokenizer.json
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:be75606093db2094d7cd20f3c2f385c212750648bd6ea4fb2bf507a6a4c55506
+size 11422650
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,29 @@
+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "is_local": true,
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}
--- a/.json
+++ b/.json