Files
DualMinded-Qwen3-1.7B/README.md

191 lines
8.1 KiB
Markdown
Raw Normal View History

---
license: apache-2.0
language:
- en
tags:
- dualmind
- knowledge-distillation
- topology-aware
- self-critique
- opus
- convergent-intelligence
- qwen3
- convergentintel
- edge
- distillation
base_model:
- reaperdoesntknow/DualMind
datasets:
- nohurry/Opus-4.6-Reasoning-3000x-filtered
- zai-org/LongWriter-6k
model_name: DualMinded-Qwen3-1.7B
pipeline_tag: text-generation
---
# DualMinded-Qwen3-1.7B
A 1.7B parameter dual-cognition model trained on **Opus 4.6 reasoning traces**. The model implements a three-phase cognitive loop — explore, examine, respond — where it reasons freely, critiques its own reasoning, then synthesizes a clean answer.
**Convergent Intelligence LLC: Research Division**
## Architecture
```
<explore> — unconstrained reasoning, derivation, speculation
</explore>
<examine> — adversarial self-critique, error detection, refinement
</examine>
<response> — clean synthesis from the internal dialogue
</response>
```
This is the multi-model collision array collapsed into a single architecture. The dialectical structure that produces novel insights from architectural diversity is recreated through role-conditioned generation on shared weights. No extra parameters, no routing — same weights, different cognitive modes.
## Training Pipeline
DualMinded-Qwen3-1.7B is the product of a four-stage pipeline:
**Stage 1 — Multi-Teacher Distillation:**
Qwen3-30B-A3B in three variants (Instruct, Thinking, Coder) distilled into Qwen3-1.7B via proof-weighted KD with 2.25× loss amplification on reasoning tokens.
**Stage 2 — DISC Refinement:**
Disctil-Qwen3-1.7B: the student refined through Discrepancy Calculus, detecting and preserving structural boundaries in the teacher's distribution.
**Stage 3 — Topological Knowledge Distillation (TKD):**
Continuous-stream distillation with topology-guided windowing from Qwen3-30B-A3B-Thinking. Bounded variation decomposition of the teacher's output: smooth + jumps + drift. Jump positions amplified at 3σ, windows cut at low-discrepancy boundaries, 4-phase curriculum ordering (easy → hard).
**Stage 4 — DualMind SFT on Opus 4.6:**
SFT using [Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered). The `thinking` column maps directly to `<explore>` — no heuristic sentence splitting needed. The `solution` column is split into `<examine>` + `<response>`.
### Training Configuration
| Parameter | Value |
|-----------|-------|
| Base checkpoint | TKD checkpoint-512 |
| Dataset | Opus-4.6-Reasoning-3000x-filtered (50%) |
| Max seq length | 2048 |
| Batch size | 2 × 8 accum = 16 effective |
| Learning rate | 5e-6 (cosine) |
| Warmup | 32 steps |
| Max steps | 1024 |
| Precision | BF16 |
| Hardware | NVIDIA H100 |
## DualMind vs DualMinded
| | DualMind | DualMinded |
|---|---------|-----------|
| **SFT Data** | LogicInference_OA | Opus-4.6-Reasoning |
| **Explore Source** | Heuristic CoT split | Direct Opus `thinking` column |
| **Strength** | Formal logic, structured proofs | Extended reasoning, creative derivation |
| **Base Checkpoint** | TKD final | TKD checkpoint-512 |
## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained(
"reaperdoesntknow/DualMinded-Qwen3-1.7B",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/DualMinded-Qwen3-1.7B")
prompt = "##USER:\nProve the mean value theorem.\n\n<explore>\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.6,
top_p=0.9,
repetition_penalty=1.15,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
```
## Ghost Imprinting
Sequential distillation from multiple teachers (Instruct → Thinking → Coder → Opus) leaves residual fields in weight space. These residuals produce capabilities absent from any individual teacher — the singular-continuous component of the bounded variation decomposition applied to the parameter tensor. Models in the DualMind family exhibit emergent behaviors (e.g., literary content from physics-only training data) attributable to these ghost imprints.
## GGUF
Quantized versions available at [DualMinded-Qwen3-1.7B-GGUF](https://huggingface.co/reaperdoesntknow/DualMinded-Qwen3-1.7B-GGUF): F16, Q8_0, Q5_K_M, Q4_K_M.
**Ollama:** `ollama run reaperdoesntrun/DualMinded-1.7B`
## Related
- [DualMind](https://huggingface.co/reaperdoesntknow/DualMind) — LogicInference-trained variant
- [DualMind_Methodolgy](https://huggingface.co/reaperdoesntknow/DualMind_Methodolgy) — Paper: DOI [10.57967/hf/8184](https://doi.org/10.57967/hf/8184)
- [Structure Over Scale](https://doi.org/10.57967/hf/8165) — Paper 1: CPU training methodology
- [DualMind Collection](https://huggingface.co/collections/reaperdoesntknow/dualmind)
- [DistilQwen Collection](https://huggingface.co/collections/reaperdoesntknow/distilqwen)
## Mathematical Foundations: Discrepancy Calculus (DISC)
This model's training pipeline is grounded in Discrepancy Calculus — a measure-theoretic framework that treats singularities as primary structure rather than pathology. Full theory: *"On the Formal Analysis of Discrepancy Calculus"* (Colca, 2026; Convergent Intelligence LLC: Research Division).
**The Core Operator:**
$$Df(x) = \lim_{\varepsilon \downarrow 0} \frac{1}{\varepsilon} \int_x^{x+\varepsilon} \frac{|f(t) - f(x)|}{|t - x|}\, dt$$
For smooth $f$: $Df(x) = |f'(x)|$. For rough $f$: $D$ localizes irregularity to null sets while preserving integral structure.
**The Mesh Fundamental Identity** — every BV function decomposes as:
$$f(b) - f(a) = \underbrace{\int_a^b f'(x)\,dx}_{\text{smooth (AC)}} + \underbrace{\sum_{x \in J_f} \Delta f(x)}_{\text{jumps}} + \underbrace{D^c f(I)}_{\text{Cantor drift}}$$
Standard knowledge distillation captures only term 1. Topological Knowledge Distillation (TKD) preserves all three by treating the teacher's output distribution as a BV function and computing discrepancy energy, jump sets, and gap energy density before training begins.
## Citation
```bibtex
@misc{colca2026dualmind,
title={From Three Teachers to Dual Cognition: Topology-Aware Multi-Teacher Distillation and Role-Conditioned Self-Critique at 1.7B Scale},
author={Colca, Roy S.},
year={2026},
publisher={HuggingFace},
url={https://doi.org/10.57967/hf/8184}
}
```
*Convergent Intelligence LLC: Research Division — Apache 2.0*
<!-- card-refresh: 2026-03-30 -->
---
## Convergent Intelligence Portfolio
*Part of the [DualMind Series](https://huggingface.co/collections/reaperdoesntknow/dualmind-69c93f888c6e79ecc69cf41e) by [Convergent Intelligence LLC: Research Division](https://huggingface.co/reaperdoesntknow)*
### DualMind Family
| Model | Format | Description |
|-------|--------|-------------|
| [DualMind](https://huggingface.co/reaperdoesntknow/DualMind) | BF16 | LogicInference-trained. Explore→Examine→Response loop. |
| [DualMinded-Qwen3-1.7B](https://huggingface.co/reaperdoesntknow/DualMinded-Qwen3-1.7B) | BF16 | Opus 4.6 reasoning traces. Higher quality splits. |
| [Dualmind-Qwen-1.7B-Thinking](https://huggingface.co/reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking) | BF16 | Thinking-teacher variant with extended deliberation. |
| [DualMind-GGUF](https://huggingface.co/reaperdoesntknow/DualMind-GGUF) | GGUF | Quantized LogicInference variant. CPU/6GB GPU. |
| [DualMinded-Qwen3-1.7B-GGUF](https://huggingface.co/reaperdoesntknow/DualMinded-Qwen3-1.7B-GGUF) | GGUF | Quantized Opus variant. Ollama ready. |
### Papers
| Paper | DOI |
|-------|-----|
| [Structure Over Scale](https://huggingface.co/reaperdoesntknow/Structure-Over-Scale) | 10.57967/hf/8165 |
| [Three Teachers to Dual Cognition](https://huggingface.co/reaperdoesntknow/DualMind_Methodolgy) | 10.57967/hf/8184 |
| [Discrepancy Calculus](https://huggingface.co/reaperdoesntknow/Discrepancy_Calculus) | 10.57967/hf/8194 |
---
*Last updated: 2026-03-31 by Convergent Intelligence LLC: Research Division*
<!-- cix-keeper-ts:2026-06-12T13:15:39Z -->