Files
Wind-Edge-1.6-GGUF/README.md
ModelHub XC f8cf9cfa21 初始化项目,由ModelHub XC社区提供模型
Model: North-ML1/Wind-Edge-1.6-GGUF
Source: Original Platform
2026-06-12 23:22:19 +08:00

185 lines
5.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
model-index:
- name: wind-edge-1.6@f16
results:
- task:
type: text-generation
name: Code Generation
dataset:
name: CodeBench-30
type: North-ML1/CodeBench-30
split: train
metrics:
- name: Overall Accuracy
type: accuracy
value: 6.25
verified: false
- name: Easy Tier Accuracy
type: accuracy
value: 17.14
verified: false
- name: Medium Tier Accuracy
type: accuracy
value: 0.00
verified: false
- name: Hard Tier Accuracy
type: accuracy
value: 0.00
verified: false
library_name: transformers
pipeline_tag: text-generation
tags:
- wind-edge
- causal-lm
- edge
- small-language-model
- 0.4b
license: mit
datasets:
- Jackrong/GLM-5.1-Reasoning-1M-Cleaned
language:
- en
base_model:
- North-ML1/Wind-Edge-1.6-Instruct
---
# Wind Edge 1.6 — Geode (0.4B)
A 0.4B parameter causal language model built for edge deployment. Fast, small, and honest about what it can do.
**[North ML](https://huggingface.co/north-ml1)** · [Wind Arc 1.5 Preview](https://huggingface.co/arthu1/wind-arc-1-5-preview)
---
## Overview
Wind Edge 1.6 (Geode) is a compact LLM trained for real-time, on-device inference. At 0.4B parameters it sits in the ultra-small tier — expect strong common-sense and classification performance, limited hard reasoning.
**Best use cases:**
- Instruction-following dialogue (short to medium turns)
- Text classification and sentiment
- Light code completion
- Summarization of short passages
**Not recommended for:** multi-step math, complex logical chains, long-context tasks.
---
## Changes vs 1.5
- Improved instruction adherence on structured output formats
- More stable multi-sentence generation (fewer mid-sequence repetitions)
- Reduced hallucination rate on short factual queries (internal held-out eval)
---
## Honest Benchmark Estimates
Realistic ranges for a well-trained 0.4B model — not cherry-picked numbers.
| Task | Expected Range | Notes |
|-----------------------|----------------|-------|
| Common Sense (0-shot) | 0.60 0.68 | Reliable strength |
| Sentiment Analysis | 0.70 0.80 | Reliable strength |
| Text Classification | 0.68 0.78 | Reliable strength |
| Reading Comprehension | 0.52 0.63 | Context-dependent |
| Summarization | 0.58 0.68 | Short docs only |
| Code Generation | 0.45 0.58 | Simple tasks only |
| Math Reasoning | 0.15 0.28 | Known weak point at this scale |
| Logical Reasoning | 0.18 0.28 | Known weak point at this scale |
A 0.4B model cannot compete with 7B+ on reasoning — Geode doesn't pretend to.
---
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("north-ml1/wind-edge-1.6")
tokenizer = AutoTokenizer.from_pretrained("north-ml1/wind-edge-1.6")
inputs = tokenizer("You are Wind Edge, a helpful AI assistant.\nUser: ", return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=256, temperature=0.6, top_p=0.9)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
### Recommended Settings
| Parameter | Value |
|--------------------|----------|
| temperature | 0.0 |
| top_p | 0.95 |
| min_p | 0.05 |
| max_new_tokens | 256512 |
| repetition_penalty | 1.1 |
| context_limit | 1024-4096|
---
## GGUF Quantizations
GGUF quants converted from [arthu1/Wind-Edge-1.6-Instruct](https://huggingface.co/arthu1/Wind-Edge-1.6-Instruct) using a Qwen3-compatible tensor layout. The Transformers repo remains canonical — use these for llama.cpp, LM Studio, Ollama-style runtimes, and any other GGUF-compatible inference stack.
### Files
| File | bpw | Use |
|------|-----|-----|
| Wind-Edge-1.6-TQ1_0.gguf | ~1.7 bpw | Experimental 1-bit/ternary. Lowest quality, smallest size. |
| Wind-Edge-1.6-TQ2_0.gguf | ~2.1 bpw | Very small 2-bit/ternary option. |
| Wind-Edge-1.6-IQ3_M.gguf | ~3.7 bpw | Good balance for tiny devices. |
| Wind-Edge-1.6-Q4_K_M.gguf | ~4.6 bpw | **Recommended default.** |
| Wind-Edge-1.6-Q6_K.gguf | ~6.1 bpw | Higher quality, still compact. |
| Wind-Edge-1.6-Q8_0.gguf | ~8.5 bpw | Near-lossless practical quant. |
| Wind-Edge-1.6-F16.gguf | 16 bpw | Full precision GGUF export. |
Q4_K_M, Q6_K, and Q8_0 are the recommended daily drivers. TQ1_0 and TQ2_0 are included for constrained edge hardware but will lose measurable reasoning and factual accuracy.
### llama.cpp
```bash
llama-cli \
-m Wind-Edge-1.6-Q4_K_M.gguf \
-cnv \
--temp 0.6 \
--top-p 0.9 \
--repeat-penalty 1.06 \
-n 512
```
For deterministic output, use `--temp 0` and keep prompts short.
### Chat Template
The GGUF metadata includes the chat template. If your runtime doesn't apply it automatically:
```
<|im_start|>system
You are Wind-Edge-1.6, a compact AI assistant model. You are not a human.<|im_end|>
<|im_start|>user
Who are you?<|im_end|>
<|im_start|>assistant
<think>
</think>
```
---
## Model Details
| Property | Value |
|----------------|-------|
| Parameters | ~0.4B |
| Architecture | Causal LM (decoder-only) |
| Context Length | 8192 tokens |
| Quantization | 1-16bit (GGUF) |
| Org | [north-ml1](https://huggingface.co/north-ml1) |
---
## License
MIT