--- model-index: - name: wind-edge-1.6@f16 results: - task: type: text-generation name: Code Generation dataset: name: CodeBench-30 type: North-ML1/CodeBench-30 split: train metrics: - name: Overall Accuracy type: accuracy value: 6.25 verified: false - name: Easy Tier Accuracy type: accuracy value: 17.14 verified: false - name: Medium Tier Accuracy type: accuracy value: 0.00 verified: false - name: Hard Tier Accuracy type: accuracy value: 0.00 verified: false library_name: transformers pipeline_tag: text-generation tags: - wind-edge - causal-lm - edge - small-language-model - 0.4b license: mit datasets: - Jackrong/GLM-5.1-Reasoning-1M-Cleaned language: - en base_model: - North-ML1/Wind-Edge-1.6-Instruct --- # Wind Edge 1.6 — Geode (0.4B) A 0.4B parameter causal language model built for edge deployment. Fast, small, and honest about what it can do. **[North ML](https://huggingface.co/north-ml1)** · [Wind Arc 1.5 Preview](https://huggingface.co/arthu1/wind-arc-1-5-preview) --- ## Overview Wind Edge 1.6 (Geode) is a compact LLM trained for real-time, on-device inference. At 0.4B parameters it sits in the ultra-small tier — expect strong common-sense and classification performance, limited hard reasoning. **Best use cases:** - Instruction-following dialogue (short to medium turns) - Text classification and sentiment - Light code completion - Summarization of short passages **Not recommended for:** multi-step math, complex logical chains, long-context tasks. --- ## Changes vs 1.5 - Improved instruction adherence on structured output formats - More stable multi-sentence generation (fewer mid-sequence repetitions) - Reduced hallucination rate on short factual queries (internal held-out eval) --- ## Honest Benchmark Estimates Realistic ranges for a well-trained 0.4B model — not cherry-picked numbers. | Task | Expected Range | Notes | |-----------------------|----------------|-------| | Common Sense (0-shot) | 0.60 – 0.68 | Reliable strength | | Sentiment Analysis | 0.70 – 0.80 | Reliable strength | | Text Classification | 0.68 – 0.78 | Reliable strength | | Reading Comprehension | 0.52 – 0.63 | Context-dependent | | Summarization | 0.58 – 0.68 | Short docs only | | Code Generation | 0.45 – 0.58 | Simple tasks only | | Math Reasoning | 0.15 – 0.28 | Known weak point at this scale | | Logical Reasoning | 0.18 – 0.28 | Known weak point at this scale | A 0.4B model cannot compete with 7B+ on reasoning — Geode doesn't pretend to. --- ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("north-ml1/wind-edge-1.6") tokenizer = AutoTokenizer.from_pretrained("north-ml1/wind-edge-1.6") inputs = tokenizer("You are Wind Edge, a helpful AI assistant.\nUser: ", return_tensors="pt") output = model.generate(**inputs, max_new_tokens=256, temperature=0.6, top_p=0.9) print(tokenizer.decode(output[0], skip_special_tokens=True)) ``` ### Recommended Settings | Parameter | Value | |--------------------|----------| | temperature | 0.0 | | top_p | 0.95 | | min_p | 0.05 | | max_new_tokens | 256–512 | | repetition_penalty | 1.1 | | context_limit | 1024-4096| --- ## GGUF Quantizations GGUF quants converted from [arthu1/Wind-Edge-1.6-Instruct](https://huggingface.co/arthu1/Wind-Edge-1.6-Instruct) using a Qwen3-compatible tensor layout. The Transformers repo remains canonical — use these for llama.cpp, LM Studio, Ollama-style runtimes, and any other GGUF-compatible inference stack. ### Files | File | bpw | Use | |------|-----|-----| | Wind-Edge-1.6-TQ1_0.gguf | ~1.7 bpw | Experimental 1-bit/ternary. Lowest quality, smallest size. | | Wind-Edge-1.6-TQ2_0.gguf | ~2.1 bpw | Very small 2-bit/ternary option. | | Wind-Edge-1.6-IQ3_M.gguf | ~3.7 bpw | Good balance for tiny devices. | | Wind-Edge-1.6-Q4_K_M.gguf | ~4.6 bpw | **Recommended default.** | | Wind-Edge-1.6-Q6_K.gguf | ~6.1 bpw | Higher quality, still compact. | | Wind-Edge-1.6-Q8_0.gguf | ~8.5 bpw | Near-lossless practical quant. | | Wind-Edge-1.6-F16.gguf | 16 bpw | Full precision GGUF export. | Q4_K_M, Q6_K, and Q8_0 are the recommended daily drivers. TQ1_0 and TQ2_0 are included for constrained edge hardware but will lose measurable reasoning and factual accuracy. ### llama.cpp ```bash llama-cli \ -m Wind-Edge-1.6-Q4_K_M.gguf \ -cnv \ --temp 0.6 \ --top-p 0.9 \ --repeat-penalty 1.06 \ -n 512 ``` For deterministic output, use `--temp 0` and keep prompts short. ### Chat Template The GGUF metadata includes the chat template. If your runtime doesn't apply it automatically: ``` <|im_start|>system You are Wind-Edge-1.6, a compact AI assistant model. You are not a human.<|im_end|> <|im_start|>user Who are you?<|im_end|> <|im_start|>assistant ``` --- ## Model Details | Property | Value | |----------------|-------| | Parameters | ~0.4B | | Architecture | Causal LM (decoder-only) | | Context Length | 8192 tokens | | Quantization | 1-16bit (GGUF) | | Org | [north-ml1](https://huggingface.co/north-ml1) | --- ## License MIT