Model: North-ML1/Wind-Edge-1.6-GGUF Source: Original Platform
model-index, library_name, pipeline_tag, tags, license, datasets, language, base_model
| model-index | library_name | pipeline_tag | tags | license | datasets | language | base_model | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
transformers | text-generation |
|
mit |
|
|
|
Wind Edge 1.6 — Geode (0.4B)
A 0.4B parameter causal language model built for edge deployment. Fast, small, and honest about what it can do.
North ML · Wind Arc 1.5 Preview
Overview
Wind Edge 1.6 (Geode) is a compact LLM trained for real-time, on-device inference. At 0.4B parameters it sits in the ultra-small tier — expect strong common-sense and classification performance, limited hard reasoning.
Best use cases:
- Instruction-following dialogue (short to medium turns)
- Text classification and sentiment
- Light code completion
- Summarization of short passages
Not recommended for: multi-step math, complex logical chains, long-context tasks.
Changes vs 1.5
- Improved instruction adherence on structured output formats
- More stable multi-sentence generation (fewer mid-sequence repetitions)
- Reduced hallucination rate on short factual queries (internal held-out eval)
Honest Benchmark Estimates
Realistic ranges for a well-trained 0.4B model — not cherry-picked numbers.
| Task | Expected Range | Notes |
|---|---|---|
| Common Sense (0-shot) | 0.60 – 0.68 | Reliable strength |
| Sentiment Analysis | 0.70 – 0.80 | Reliable strength |
| Text Classification | 0.68 – 0.78 | Reliable strength |
| Reading Comprehension | 0.52 – 0.63 | Context-dependent |
| Summarization | 0.58 – 0.68 | Short docs only |
| Code Generation | 0.45 – 0.58 | Simple tasks only |
| Math Reasoning | 0.15 – 0.28 | Known weak point at this scale |
| Logical Reasoning | 0.18 – 0.28 | Known weak point at this scale |
A 0.4B model cannot compete with 7B+ on reasoning — Geode doesn't pretend to.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("north-ml1/wind-edge-1.6")
tokenizer = AutoTokenizer.from_pretrained("north-ml1/wind-edge-1.6")
inputs = tokenizer("You are Wind Edge, a helpful AI assistant.\nUser: ", return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=256, temperature=0.6, top_p=0.9)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Recommended Settings
| Parameter | Value |
|---|---|
| temperature | 0.0 |
| top_p | 0.95 |
| min_p | 0.05 |
| max_new_tokens | 256–512 |
| repetition_penalty | 1.1 |
| context_limit | 1024-4096 |
GGUF Quantizations
GGUF quants converted from arthu1/Wind-Edge-1.6-Instruct using a Qwen3-compatible tensor layout. The Transformers repo remains canonical — use these for llama.cpp, LM Studio, Ollama-style runtimes, and any other GGUF-compatible inference stack.
Files
| File | bpw | Use |
|---|---|---|
| Wind-Edge-1.6-TQ1_0.gguf | ~1.7 bpw | Experimental 1-bit/ternary. Lowest quality, smallest size. |
| Wind-Edge-1.6-TQ2_0.gguf | ~2.1 bpw | Very small 2-bit/ternary option. |
| Wind-Edge-1.6-IQ3_M.gguf | ~3.7 bpw | Good balance for tiny devices. |
| Wind-Edge-1.6-Q4_K_M.gguf | ~4.6 bpw | Recommended default. |
| Wind-Edge-1.6-Q6_K.gguf | ~6.1 bpw | Higher quality, still compact. |
| Wind-Edge-1.6-Q8_0.gguf | ~8.5 bpw | Near-lossless practical quant. |
| Wind-Edge-1.6-F16.gguf | 16 bpw | Full precision GGUF export. |
Q4_K_M, Q6_K, and Q8_0 are the recommended daily drivers. TQ1_0 and TQ2_0 are included for constrained edge hardware but will lose measurable reasoning and factual accuracy.
llama.cpp
llama-cli \
-m Wind-Edge-1.6-Q4_K_M.gguf \
-cnv \
--temp 0.6 \
--top-p 0.9 \
--repeat-penalty 1.06 \
-n 512
For deterministic output, use --temp 0 and keep prompts short.
Chat Template
The GGUF metadata includes the chat template. If your runtime doesn't apply it automatically:
<|im_start|>system
You are Wind-Edge-1.6, a compact AI assistant model. You are not a human.<|im_end|>
<|im_start|>user
Who are you?<|im_end|>
<|im_start|>assistant
<think>
</think>
Model Details
| Property | Value |
|---|---|
| Parameters | ~0.4B |
| Architecture | Causal LM (decoder-only) |
| Context Length | 8192 tokens |
| Quantization | 1-16bit (GGUF) |
| Org | north-ml1 |
License
MIT