--- license: apache-2.0 language: - en tags: - dualmind - knowledge-distillation - topology-aware - self-critique - opus - convergent-intelligence - qwen3 - convergentintel - edge - distillation base_model: - reaperdoesntknow/DualMind datasets: - nohurry/Opus-4.6-Reasoning-3000x-filtered - zai-org/LongWriter-6k model_name: DualMinded-Qwen3-1.7B pipeline_tag: text-generation --- # DualMinded-Qwen3-1.7B A 1.7B parameter dual-cognition model trained on **Opus 4.6 reasoning traces**. The model implements a three-phase cognitive loop — explore, examine, respond — where it reasons freely, critiques its own reasoning, then synthesizes a clean answer. **Convergent Intelligence LLC: Research Division** ## Architecture ``` — unconstrained reasoning, derivation, speculation — adversarial self-critique, error detection, refinement — clean synthesis from the internal dialogue ``` This is the multi-model collision array collapsed into a single architecture. The dialectical structure that produces novel insights from architectural diversity is recreated through role-conditioned generation on shared weights. No extra parameters, no routing — same weights, different cognitive modes. ## Training Pipeline DualMinded-Qwen3-1.7B is the product of a four-stage pipeline: **Stage 1 — Multi-Teacher Distillation:** Qwen3-30B-A3B in three variants (Instruct, Thinking, Coder) distilled into Qwen3-1.7B via proof-weighted KD with 2.25× loss amplification on reasoning tokens. **Stage 2 — DISC Refinement:** Disctil-Qwen3-1.7B: the student refined through Discrepancy Calculus, detecting and preserving structural boundaries in the teacher's distribution. **Stage 3 — Topological Knowledge Distillation (TKD):** Continuous-stream distillation with topology-guided windowing from Qwen3-30B-A3B-Thinking. Bounded variation decomposition of the teacher's output: smooth + jumps + drift. Jump positions amplified at 3σ, windows cut at low-discrepancy boundaries, 4-phase curriculum ordering (easy → hard). **Stage 4 — DualMind SFT on Opus 4.6:** SFT using [Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered). The `thinking` column maps directly to `` — no heuristic sentence splitting needed. The `solution` column is split into `` + ``. ### Training Configuration | Parameter | Value | |-----------|-------| | Base checkpoint | TKD checkpoint-512 | | Dataset | Opus-4.6-Reasoning-3000x-filtered (50%) | | Max seq length | 2048 | | Batch size | 2 × 8 accum = 16 effective | | Learning rate | 5e-6 (cosine) | | Warmup | 32 steps | | Max steps | 1024 | | Precision | BF16 | | Hardware | NVIDIA H100 | ## DualMind vs DualMinded | | DualMind | DualMinded | |---|---------|-----------| | **SFT Data** | LogicInference_OA | Opus-4.6-Reasoning | | **Explore Source** | Heuristic CoT split | Direct Opus `thinking` column | | **Strength** | Formal logic, structured proofs | Extended reasoning, creative derivation | | **Base Checkpoint** | TKD final | TKD checkpoint-512 | ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model = AutoModelForCausalLM.from_pretrained( "reaperdoesntknow/DualMinded-Qwen3-1.7B", torch_dtype=torch.bfloat16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/DualMinded-Qwen3-1.7B") prompt = "##USER:\nProve the mean value theorem.\n\n\n" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): out = model.generate( **inputs, max_new_tokens=512, do_sample=True, temperature=0.6, top_p=0.9, repetition_penalty=1.15, ) print(tokenizer.decode(out[0], skip_special_tokens=True)) ``` ## Ghost Imprinting Sequential distillation from multiple teachers (Instruct → Thinking → Coder → Opus) leaves residual fields in weight space. These residuals produce capabilities absent from any individual teacher — the singular-continuous component of the bounded variation decomposition applied to the parameter tensor. Models in the DualMind family exhibit emergent behaviors (e.g., literary content from physics-only training data) attributable to these ghost imprints. ## GGUF Quantized versions available at [DualMinded-Qwen3-1.7B-GGUF](https://huggingface.co/reaperdoesntknow/DualMinded-Qwen3-1.7B-GGUF): F16, Q8_0, Q5_K_M, Q4_K_M. **Ollama:** `ollama run reaperdoesntrun/DualMinded-1.7B` ## Related - [DualMind](https://huggingface.co/reaperdoesntknow/DualMind) — LogicInference-trained variant - [DualMind_Methodolgy](https://huggingface.co/reaperdoesntknow/DualMind_Methodolgy) — Paper: DOI [10.57967/hf/8184](https://doi.org/10.57967/hf/8184) - [Structure Over Scale](https://doi.org/10.57967/hf/8165) — Paper 1: CPU training methodology - [DualMind Collection](https://huggingface.co/collections/reaperdoesntknow/dualmind) - [DistilQwen Collection](https://huggingface.co/collections/reaperdoesntknow/distilqwen) ## Mathematical Foundations: Discrepancy Calculus (DISC) This model's training pipeline is grounded in Discrepancy Calculus — a measure-theoretic framework that treats singularities as primary structure rather than pathology. Full theory: *"On the Formal Analysis of Discrepancy Calculus"* (Colca, 2026; Convergent Intelligence LLC: Research Division). **The Core Operator:** $$Df(x) = \lim_{\varepsilon \downarrow 0} \frac{1}{\varepsilon} \int_x^{x+\varepsilon} \frac{|f(t) - f(x)|}{|t - x|}\, dt$$ For smooth $f$: $Df(x) = |f'(x)|$. For rough $f$: $D$ localizes irregularity to null sets while preserving integral structure. **The Mesh Fundamental Identity** — every BV function decomposes as: $$f(b) - f(a) = \underbrace{\int_a^b f'(x)\,dx}_{\text{smooth (AC)}} + \underbrace{\sum_{x \in J_f} \Delta f(x)}_{\text{jumps}} + \underbrace{D^c f(I)}_{\text{Cantor drift}}$$ Standard knowledge distillation captures only term 1. Topological Knowledge Distillation (TKD) preserves all three by treating the teacher's output distribution as a BV function and computing discrepancy energy, jump sets, and gap energy density before training begins. ## Citation ```bibtex @misc{colca2026dualmind, title={From Three Teachers to Dual Cognition: Topology-Aware Multi-Teacher Distillation and Role-Conditioned Self-Critique at 1.7B Scale}, author={Colca, Roy S.}, year={2026}, publisher={HuggingFace}, url={https://doi.org/10.57967/hf/8184} } ``` *Convergent Intelligence LLC: Research Division — Apache 2.0* --- ## Convergent Intelligence Portfolio *Part of the [DualMind Series](https://huggingface.co/collections/reaperdoesntknow/dualmind-69c93f888c6e79ecc69cf41e) by [Convergent Intelligence LLC: Research Division](https://huggingface.co/reaperdoesntknow)* ### DualMind Family | Model | Format | Description | |-------|--------|-------------| | [DualMind](https://huggingface.co/reaperdoesntknow/DualMind) | BF16 | LogicInference-trained. Explore→Examine→Response loop. | | [DualMinded-Qwen3-1.7B](https://huggingface.co/reaperdoesntknow/DualMinded-Qwen3-1.7B) | BF16 | Opus 4.6 reasoning traces. Higher quality splits. | | [Dualmind-Qwen-1.7B-Thinking](https://huggingface.co/reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking) | BF16 | Thinking-teacher variant with extended deliberation. | | [DualMind-GGUF](https://huggingface.co/reaperdoesntknow/DualMind-GGUF) | GGUF | Quantized LogicInference variant. CPU/6GB GPU. | | [DualMinded-Qwen3-1.7B-GGUF](https://huggingface.co/reaperdoesntknow/DualMinded-Qwen3-1.7B-GGUF) | GGUF | Quantized Opus variant. Ollama ready. | ### Papers | Paper | DOI | |-------|-----| | [Structure Over Scale](https://huggingface.co/reaperdoesntknow/Structure-Over-Scale) | 10.57967/hf/8165 | | [Three Teachers to Dual Cognition](https://huggingface.co/reaperdoesntknow/DualMind_Methodolgy) | 10.57967/hf/8184 | | [Discrepancy Calculus](https://huggingface.co/reaperdoesntknow/Discrepancy_Calculus) | 10.57967/hf/8194 | --- *Last updated: 2026-03-31 by Convergent Intelligence LLC: Research Division*