Wind-Edge-1.6-GGUF/README.md

---
model-index:
- name: wind-edge-1.6@f16
  results:
  - task:
      type: text-generation
      name: Code Generation
    dataset:
      name: CodeBench-30
      type: North-ML1/CodeBench-30
      split: train
    metrics:
    - name: Overall Accuracy
      type: accuracy
      value: 6.25
      verified: false
    - name: Easy Tier Accuracy
      type: accuracy
      value: 17.14
      verified: false
    - name: Medium Tier Accuracy
      type: accuracy
      value: 0.00
      verified: false
    - name: Hard Tier Accuracy
      type: accuracy
      value: 0.00
      verified: false

library_name: transformers
pipeline_tag: text-generation
tags:
- wind-edge
- causal-lm
- edge
- small-language-model
- 0.4b
license: mit
datasets:
- Jackrong/GLM-5.1-Reasoning-1M-Cleaned
language:
- en
base_model:
- North-ML1/Wind-Edge-1.6-Instruct
---

# Wind Edge 1.6 — Geode (0.4B)

A 0.4B parameter causal language model built for edge deployment. Fast, small, and honest about what it can do.

**[North ML](https://huggingface.co/north-ml1)** · [Wind Arc 1.5 Preview](https://huggingface.co/arthu1/wind-arc-1-5-preview)

---

## Overview

Wind Edge 1.6 (Geode) is a compact LLM trained for real-time, on-device inference. At 0.4B parameters it sits in the ultra-small tier — expect strong common-sense and classification performance, limited hard reasoning.

**Best use cases:**
- Instruction-following dialogue (short to medium turns)
- Text classification and sentiment
- Light code completion
- Summarization of short passages

**Not recommended for:** multi-step math, complex logical chains, long-context tasks.

---

## Changes vs 1.5

- Improved instruction adherence on structured output formats
- More stable multi-sentence generation (fewer mid-sequence repetitions)
- Reduced hallucination rate on short factual queries (internal held-out eval)

---

## Honest Benchmark Estimates

Realistic ranges for a well-trained 0.4B model — not cherry-picked numbers.

| Task                  | Expected Range | Notes |
|-----------------------|----------------|-------|
| Common Sense (0-shot) | 0.60 – 0.68    | Reliable strength |
| Sentiment Analysis    | 0.70 – 0.80    | Reliable strength |
| Text Classification   | 0.68 – 0.78    | Reliable strength |
| Reading Comprehension | 0.52 – 0.63    | Context-dependent |
| Summarization         | 0.58 – 0.68    | Short docs only |
| Code Generation       | 0.45 – 0.58    | Simple tasks only |
| Math Reasoning        | 0.15 – 0.28    | Known weak point at this scale |
| Logical Reasoning     | 0.18 – 0.28    | Known weak point at this scale |

A 0.4B model cannot compete with 7B+ on reasoning — Geode doesn't pretend to.

---

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("north-ml1/wind-edge-1.6")
tokenizer = AutoTokenizer.from_pretrained("north-ml1/wind-edge-1.6")

inputs = tokenizer("You are Wind Edge, a helpful AI assistant.\nUser: ", return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=256, temperature=0.6, top_p=0.9)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

### Recommended Settings

| Parameter          | Value    |
|--------------------|----------|
| temperature        | 0.0      |
| top_p              | 0.95     |
| min_p              | 0.05     |
| max_new_tokens     | 256–512  |
| repetition_penalty | 1.1      |
| context_limit      | 1024-4096|


---

## GGUF Quantizations

GGUF quants converted from [arthu1/Wind-Edge-1.6-Instruct](https://huggingface.co/arthu1/Wind-Edge-1.6-Instruct) using a Qwen3-compatible tensor layout. The Transformers repo remains canonical — use these for llama.cpp, LM Studio, Ollama-style runtimes, and any other GGUF-compatible inference stack.

### Files

| File | bpw | Use |
|------|-----|-----|
| Wind-Edge-1.6-TQ1_0.gguf | ~1.7 bpw | Experimental 1-bit/ternary. Lowest quality, smallest size. |
| Wind-Edge-1.6-TQ2_0.gguf | ~2.1 bpw | Very small 2-bit/ternary option. |
| Wind-Edge-1.6-IQ3_M.gguf | ~3.7 bpw | Good balance for tiny devices. |
| Wind-Edge-1.6-Q4_K_M.gguf | ~4.6 bpw | **Recommended default.** |
| Wind-Edge-1.6-Q6_K.gguf | ~6.1 bpw | Higher quality, still compact. |
| Wind-Edge-1.6-Q8_0.gguf | ~8.5 bpw | Near-lossless practical quant. |
| Wind-Edge-1.6-F16.gguf | 16 bpw | Full precision GGUF export. |

Q4_K_M, Q6_K, and Q8_0 are the recommended daily drivers. TQ1_0 and TQ2_0 are included for constrained edge hardware but will lose measurable reasoning and factual accuracy.

### llama.cpp

```bash
llama-cli \
  -m Wind-Edge-1.6-Q4_K_M.gguf \
  -cnv \
  --temp 0.6 \
  --top-p 0.9 \
  --repeat-penalty 1.06 \
  -n 512
```

For deterministic output, use `--temp 0` and keep prompts short.

### Chat Template

The GGUF metadata includes the chat template. If your runtime doesn't apply it automatically:

```
<|im_start|>system
You are Wind-Edge-1.6, a compact AI assistant model. You are not a human.<|im_end|>
<|im_start|>user
Who are you?<|im_end|>
<|im_start|>assistant
<think>
</think>
```

---

## Model Details

| Property       | Value |
|----------------|-------|
| Parameters     | ~0.4B |
| Architecture   | Causal LM (decoder-only) |
| Context Length | 8192 tokens |
| Quantization   | 1-16bit (GGUF) |
| Org            | [north-ml1](https://huggingface.co/north-ml1) |

---

## License

MIT