--- license: apache-2.0 tags: - merge - mergekit - lazymergekit - ZeroXClem - Hermes - Claude - Gemini - Opus - Flash - Codex - Kimi - Polaris - Qwen - 4B - Wrist - 'On' language: - en base_model: - ZeroXClem/Qwen3-4B-Sky-High-Hermes - ZeroXClem/Qwen3-4B-Hermes-Axion-Pro - Aimin12/Qwen3-4B-Thinking-2507-Distill-Claude-Opus-4.6-Reasoning-Abliterated - nightmedia/Qwen3-4B-Element8 - nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic - nightmedia/Qwen3-4B-Element18 - TeichAI/Qwen3-4B-Thinking-2507-Kimi-K2-Thinking-Distill - TeichAI/Qwen3-4B-Thinking-2507-GPT-5.1-Codex-Max-Distill - TeichAI/Qwen3-4B-Thinking-2507-Gemini-3-Flash-VIBE - TeichAI/Qwen3-4B-RA-SFT-Polaris-Alpha-Distill pipeline_tag: text-generation library_name: transformers datasets: - nohurry/Opus-4.6-Reasoning-3000x-filtered - TeichAI/gpt-5.1-codex-max-1000x - TeichAI/Gemini-3-Flash-Preview-VIBE - TeichAI/polaris-alpha-1000x --- # 🧠 ZeroXClem/Qwen3-4B-Wrist-On-Hermes **Precision-Guided Distilled Experts | Model_Stock Method | 4B Size at 70B+ Performance** ![WristOnHermes](https://cdn-uploads.huggingface.co/production/uploads/64408cd43e0374802e19f454/CIYiBTTFAXAhgN-ClSA1m.png) --- ## Overview **ZeroXClem/Qwen3-4B-Wrist-On-Hermes** is a high-fidelity model_stock merge built on top of **Sky-High-Hermes**, integrating the strongest reasoning, engineering, and agentic traces from the Nightmedia and TeichAI lineages. This model represents a structural synthesis of: * 🧠 Long-arc reasoning distills (Claude, Gemini, Kimi, GPT-5.1 Codex Max) * βš™οΈ Agentic coding & tool-use traces (Gemini Flash VIBE) * 🧬 RA-SFT scaffolding and structured alignment * πŸ”₯ Element-series multidimensional merge dynamics * πŸ— Hermes-Axion-Pro architectural stability It preserves the deep reasoning spine of Sky-High-Hermes while injecting the high-arc engineering and agentic cognition that define the Engineer / Agent / Element families. --- ## πŸ”§ Merge Configuration ```yaml name: ZeroXClem/Qwen3-4B-Wrist-On-Hermes base_model: ZeroXClem/Qwen3-4B-Sky-High-Hermes dtype: bfloat16 merge_method: model_stock models: - Aimin12/Qwen3-4B-Thinking-2507-Distill-Claude-Opus-4.6-Reasoning-Abliterated - nightmedia/Qwen3-4B-Element8 - nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic - nightmedia/Qwen3-4B-Element18 - TeichAI/Qwen3-4B-Thinking-2507-Kimi-K2-Thinking-Distill - TeichAI/Qwen3-4B-Thinking-2507-GPT-5.1-Codex-Max-Distill - TeichAI/Qwen3-4B-Thinking-2507-Gemini-3-Flash-VIBE - TeichAI/Qwen3-4B-RA-SFT-Polaris-Alpha-Distill - ZeroXClem/Qwen3-4B-Hermes-Axion-Pro tokenizer_source: Qwen/Qwen3-4B-Thinking-2507 ``` --- # 🧬 What This Merge Achieves Wrist-On-Hermes synthesizes three dominant cognitive streams: ### 1️⃣ Engineer-Class Arc Performance From Nightmedia’s Agent / Engineer lineage: * 0.60+ / 0.80+ arc tier reasoning envelope * High multi-hop stability * Strong structured decomposition * Excellent agent scaffolding ### 2️⃣ Claude / Gemini / Kimi Distilled Thinking From TeichAI & Aimin12 distills: * Cleaner abstraction * Reduced hallucination drift * Stronger logical continuity * Deep analytical prose ### 3️⃣ Element Multidimensional Behavior From Element8 / Element18: * Conversational richness * Quantization resistance * Dynamic reasoning personality * Better interpretation flexibility --- # πŸ“Š Performance Envelope **Wrist-On-Hermes operates in the upper Element / lower Engineer arc band**, while retaining Sky-High-Hermes long-context depth and neutrality. It behaves measurably above base models and remains stable under quantization β€” cognitive degradation between qx86-hi and bf16 is minimal outside knowledge-depth benchmarks. --- # βš”οΈ Strength Profile ## 🧠 Advanced Reasoning * Multi-hop logic * Mathematical abstraction * Deep analysis prompts * Conceptual synthesis (QM ↔ Transformers style tasks) ## βš™οΈ Engineering & Coding * Structured file-aware thinking * Clean code generation * Debug reasoning * Agentic task planning ## 🧬 Agentic Behavior * Tool-style reasoning patterns * Workspace simulation * Task decomposition * Autonomous planning style prompts ## πŸ“– Longform & Philosophy * High coherence across extended outputs * Narrative depth * Reflective reasoning * Structured argumentative essays ## πŸ’¬ Conversational Intelligence * Maintains personality coherence * Strong RP adaptability * Less brittle than pure engineer merges * Balanced abstraction and warmth --- # 🧠 Behavioral Character Sky-High-Hermes soars… **Wrist-On-Hermes Strengthens.** It is: * More grounded in structured execution * Slightly more analytic * More β€œarchitect” than β€œpoet” * Less prone to abstract drift * More deliberate in decomposition Think: Sky-High-Hermes + Engineer discipline + Element interpretive richness. --- # πŸ›  Recommended Use ### Ideal For: * Autonomous agents * Advanced coding assistants * Research synthesis * Mathematical reasoning * Philosophical deep dives * High-context conversations * Experimental multi-turn cognition ### Inference Tips * `enable_thinking=True` recommended * Temperature: 0.6–0.9 * Smoothing factor ~1.4–1.6 * High quant (Q6 / qx86-hi) performs nearly at bf16 cognition --- # πŸš€ Example Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = "ZeroXClem/Qwen3-4B-Wrist-On-Hermes" tokenizer = AutoTokenizer.from_pretrained(model) model = AutoModelForCausalLM.from_pretrained( model, torch_dtype="auto", device_map="auto" ) prompt = "Design a modular agent architecture capable of recursive self-evaluation." messages = [{"role": "user", "content": prompt}] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=True ) inputs = tokenizer([text], return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- # πŸ”“ Alignment & Safety * License: Apache 2.0 (inherits upstream licensing) * Based on Sky-High-Hermes alignment philosophy * Contains abliterated reasoning traces * Low refusal profile * Production deployments should include moderation layer --- # 🧬 Lineage Acknowledgement Gratitude to: * Nightmedia β€” Engineer / Agent / Element arc engineering * TeichAI β€” High-resolution Claude / Gemini / Kimi distills * Aimin12 β€” Opus reasoning ablation * DavidAU β€” Heretic methodology & cognitive liberation merges * Unsloth + TRL β€” Efficient Qwen3 tuning * MergeKit β€” Model stock & multislerp tooling * Qwen Team β€” Open foundation models --- # πŸ•Š Final Notes This model exists in the rare performance space where a 4B behaves like a disciplined 70B β€” sometimes flirting with 400B MOE class structured reasoning depending on performance. It does not just speak. It evaluates. It plans. And then it answers. Built with intent by **ZeroXClem | 2026**