191 lines
8.1 KiB
Markdown
191 lines
8.1 KiB
Markdown
---
|
||
license: apache-2.0
|
||
language:
|
||
- en
|
||
tags:
|
||
- dualmind
|
||
- knowledge-distillation
|
||
- topology-aware
|
||
- self-critique
|
||
- opus
|
||
- convergent-intelligence
|
||
- qwen3
|
||
- convergentintel
|
||
- edge
|
||
- distillation
|
||
base_model:
|
||
- reaperdoesntknow/DualMind
|
||
datasets:
|
||
- nohurry/Opus-4.6-Reasoning-3000x-filtered
|
||
- zai-org/LongWriter-6k
|
||
model_name: DualMinded-Qwen3-1.7B
|
||
pipeline_tag: text-generation
|
||
---
|
||
|
||
# DualMinded-Qwen3-1.7B
|
||
|
||
A 1.7B parameter dual-cognition model trained on **Opus 4.6 reasoning traces**. The model implements a three-phase cognitive loop — explore, examine, respond — where it reasons freely, critiques its own reasoning, then synthesizes a clean answer.
|
||
|
||
**Convergent Intelligence LLC: Research Division**
|
||
|
||
## Architecture
|
||
|
||
```
|
||
<explore> — unconstrained reasoning, derivation, speculation
|
||
</explore>
|
||
|
||
<examine> — adversarial self-critique, error detection, refinement
|
||
</examine>
|
||
|
||
<response> — clean synthesis from the internal dialogue
|
||
</response>
|
||
```
|
||
|
||
This is the multi-model collision array collapsed into a single architecture. The dialectical structure that produces novel insights from architectural diversity is recreated through role-conditioned generation on shared weights. No extra parameters, no routing — same weights, different cognitive modes.
|
||
|
||
## Training Pipeline
|
||
|
||
DualMinded-Qwen3-1.7B is the product of a four-stage pipeline:
|
||
|
||
**Stage 1 — Multi-Teacher Distillation:**
|
||
Qwen3-30B-A3B in three variants (Instruct, Thinking, Coder) distilled into Qwen3-1.7B via proof-weighted KD with 2.25× loss amplification on reasoning tokens.
|
||
|
||
**Stage 2 — DISC Refinement:**
|
||
Disctil-Qwen3-1.7B: the student refined through Discrepancy Calculus, detecting and preserving structural boundaries in the teacher's distribution.
|
||
|
||
**Stage 3 — Topological Knowledge Distillation (TKD):**
|
||
Continuous-stream distillation with topology-guided windowing from Qwen3-30B-A3B-Thinking. Bounded variation decomposition of the teacher's output: smooth + jumps + drift. Jump positions amplified at 3σ, windows cut at low-discrepancy boundaries, 4-phase curriculum ordering (easy → hard).
|
||
|
||
**Stage 4 — DualMind SFT on Opus 4.6:**
|
||
SFT using [Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered). The `thinking` column maps directly to `<explore>` — no heuristic sentence splitting needed. The `solution` column is split into `<examine>` + `<response>`.
|
||
|
||
### Training Configuration
|
||
|
||
| Parameter | Value |
|
||
|-----------|-------|
|
||
| Base checkpoint | TKD checkpoint-512 |
|
||
| Dataset | Opus-4.6-Reasoning-3000x-filtered (50%) |
|
||
| Max seq length | 2048 |
|
||
| Batch size | 2 × 8 accum = 16 effective |
|
||
| Learning rate | 5e-6 (cosine) |
|
||
| Warmup | 32 steps |
|
||
| Max steps | 1024 |
|
||
| Precision | BF16 |
|
||
| Hardware | NVIDIA H100 |
|
||
|
||
## DualMind vs DualMinded
|
||
|
||
| | DualMind | DualMinded |
|
||
|---|---------|-----------|
|
||
| **SFT Data** | LogicInference_OA | Opus-4.6-Reasoning |
|
||
| **Explore Source** | Heuristic CoT split | Direct Opus `thinking` column |
|
||
| **Strength** | Formal logic, structured proofs | Extended reasoning, creative derivation |
|
||
| **Base Checkpoint** | TKD final | TKD checkpoint-512 |
|
||
|
||
## Usage
|
||
|
||
```python
|
||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||
import torch
|
||
|
||
model = AutoModelForCausalLM.from_pretrained(
|
||
"reaperdoesntknow/DualMinded-Qwen3-1.7B",
|
||
torch_dtype=torch.bfloat16,
|
||
device_map="auto"
|
||
)
|
||
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/DualMinded-Qwen3-1.7B")
|
||
|
||
prompt = "##USER:\nProve the mean value theorem.\n\n<explore>\n"
|
||
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
||
|
||
with torch.no_grad():
|
||
out = model.generate(
|
||
**inputs,
|
||
max_new_tokens=512,
|
||
do_sample=True,
|
||
temperature=0.6,
|
||
top_p=0.9,
|
||
repetition_penalty=1.15,
|
||
)
|
||
print(tokenizer.decode(out[0], skip_special_tokens=True))
|
||
```
|
||
|
||
## Ghost Imprinting
|
||
|
||
Sequential distillation from multiple teachers (Instruct → Thinking → Coder → Opus) leaves residual fields in weight space. These residuals produce capabilities absent from any individual teacher — the singular-continuous component of the bounded variation decomposition applied to the parameter tensor. Models in the DualMind family exhibit emergent behaviors (e.g., literary content from physics-only training data) attributable to these ghost imprints.
|
||
|
||
## GGUF
|
||
|
||
Quantized versions available at [DualMinded-Qwen3-1.7B-GGUF](https://huggingface.co/reaperdoesntknow/DualMinded-Qwen3-1.7B-GGUF): F16, Q8_0, Q5_K_M, Q4_K_M.
|
||
|
||
**Ollama:** `ollama run reaperdoesntrun/DualMinded-1.7B`
|
||
|
||
## Related
|
||
|
||
- [DualMind](https://huggingface.co/reaperdoesntknow/DualMind) — LogicInference-trained variant
|
||
- [DualMind_Methodolgy](https://huggingface.co/reaperdoesntknow/DualMind_Methodolgy) — Paper: DOI [10.57967/hf/8184](https://doi.org/10.57967/hf/8184)
|
||
- [Structure Over Scale](https://doi.org/10.57967/hf/8165) — Paper 1: CPU training methodology
|
||
- [DualMind Collection](https://huggingface.co/collections/reaperdoesntknow/dualmind)
|
||
- [DistilQwen Collection](https://huggingface.co/collections/reaperdoesntknow/distilqwen)
|
||
|
||
|
||
## Mathematical Foundations: Discrepancy Calculus (DISC)
|
||
|
||
This model's training pipeline is grounded in Discrepancy Calculus — a measure-theoretic framework that treats singularities as primary structure rather than pathology. Full theory: *"On the Formal Analysis of Discrepancy Calculus"* (Colca, 2026; Convergent Intelligence LLC: Research Division).
|
||
|
||
**The Core Operator:**
|
||
|
||
$$Df(x) = \lim_{\varepsilon \downarrow 0} \frac{1}{\varepsilon} \int_x^{x+\varepsilon} \frac{|f(t) - f(x)|}{|t - x|}\, dt$$
|
||
|
||
For smooth $f$: $Df(x) = |f'(x)|$. For rough $f$: $D$ localizes irregularity to null sets while preserving integral structure.
|
||
|
||
**The Mesh Fundamental Identity** — every BV function decomposes as:
|
||
|
||
$$f(b) - f(a) = \underbrace{\int_a^b f'(x)\,dx}_{\text{smooth (AC)}} + \underbrace{\sum_{x \in J_f} \Delta f(x)}_{\text{jumps}} + \underbrace{D^c f(I)}_{\text{Cantor drift}}$$
|
||
|
||
Standard knowledge distillation captures only term 1. Topological Knowledge Distillation (TKD) preserves all three by treating the teacher's output distribution as a BV function and computing discrepancy energy, jump sets, and gap energy density before training begins.
|
||
|
||
## Citation
|
||
|
||
```bibtex
|
||
@misc{colca2026dualmind,
|
||
title={From Three Teachers to Dual Cognition: Topology-Aware Multi-Teacher Distillation and Role-Conditioned Self-Critique at 1.7B Scale},
|
||
author={Colca, Roy S.},
|
||
year={2026},
|
||
publisher={HuggingFace},
|
||
url={https://doi.org/10.57967/hf/8184}
|
||
}
|
||
```
|
||
|
||
*Convergent Intelligence LLC: Research Division — Apache 2.0*
|
||
<!-- card-refresh: 2026-03-30 -->
|
||
|
||
---
|
||
|
||
## Convergent Intelligence Portfolio
|
||
|
||
*Part of the [DualMind Series](https://huggingface.co/collections/reaperdoesntknow/dualmind-69c93f888c6e79ecc69cf41e) by [Convergent Intelligence LLC: Research Division](https://huggingface.co/reaperdoesntknow)*
|
||
|
||
### DualMind Family
|
||
|
||
| Model | Format | Description |
|
||
|-------|--------|-------------|
|
||
| [DualMind](https://huggingface.co/reaperdoesntknow/DualMind) | BF16 | LogicInference-trained. Explore→Examine→Response loop. |
|
||
| [DualMinded-Qwen3-1.7B](https://huggingface.co/reaperdoesntknow/DualMinded-Qwen3-1.7B) | BF16 | Opus 4.6 reasoning traces. Higher quality splits. |
|
||
| [Dualmind-Qwen-1.7B-Thinking](https://huggingface.co/reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking) | BF16 | Thinking-teacher variant with extended deliberation. |
|
||
| [DualMind-GGUF](https://huggingface.co/reaperdoesntknow/DualMind-GGUF) | GGUF | Quantized LogicInference variant. CPU/6GB GPU. |
|
||
| [DualMinded-Qwen3-1.7B-GGUF](https://huggingface.co/reaperdoesntknow/DualMinded-Qwen3-1.7B-GGUF) | GGUF | Quantized Opus variant. Ollama ready. |
|
||
|
||
### Papers
|
||
|
||
| Paper | DOI |
|
||
|-------|-----|
|
||
| [Structure Over Scale](https://huggingface.co/reaperdoesntknow/Structure-Over-Scale) | 10.57967/hf/8165 |
|
||
| [Three Teachers to Dual Cognition](https://huggingface.co/reaperdoesntknow/DualMind_Methodolgy) | 10.57967/hf/8184 |
|
||
| [Discrepancy Calculus](https://huggingface.co/reaperdoesntknow/Discrepancy_Calculus) | 10.57967/hf/8194 |
|
||
|
||
---
|
||
|
||
*Last updated: 2026-03-31 by Convergent Intelligence LLC: Research Division*
|
||
<!-- cix-keeper-ts:2026-06-12T13:15:39Z -->
|