From 91026b99191a4f4b81529ce1d53f655858727323 Mon Sep 17 00:00:00 2001 From: ModelHub XC Date: Wed, 17 Jun 2026 05:54:16 +0800 Subject: [PATCH] =?UTF-8?q?=E5=88=9D=E5=A7=8B=E5=8C=96=E9=A1=B9=E7=9B=AE?= =?UTF-8?q?=EF=BC=8C=E7=94=B1ModelHub=20XC=E7=A4=BE=E5=8C=BA=E6=8F=90?= =?UTF-8?q?=E4=BE=9B=E6=A8=A1=E5=9E=8B?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Model: flammenai/FlameDesigner-Qwen2.5-3B-v1-GGUF Source: Original Platform --- .gitattributes | 39 +++++++++ FlameDesigner-Qwen2.5-3B-v1.Q4_K_M.gguf | 3 + FlameDesigner-Qwen2.5-3B-v1.Q5_K_M.gguf | 3 + FlameDesigner-Qwen2.5-3B-v1.Q8_0.gguf | 3 + FlameDesigner-Qwen2.5-3B-v1.f16.gguf | 3 + README.md | 107 ++++++++++++++++++++++++ 6 files changed, 158 insertions(+) create mode 100644 .gitattributes create mode 100644 FlameDesigner-Qwen2.5-3B-v1.Q4_K_M.gguf create mode 100644 FlameDesigner-Qwen2.5-3B-v1.Q5_K_M.gguf create mode 100644 FlameDesigner-Qwen2.5-3B-v1.Q8_0.gguf create mode 100644 FlameDesigner-Qwen2.5-3B-v1.f16.gguf create mode 100644 README.md diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..f64bcf8 --- /dev/null +++ b/.gitattributes @@ -0,0 +1,39 @@ +*.7z filter=lfs diff=lfs merge=lfs -text +*.arrow filter=lfs diff=lfs merge=lfs -text +*.bin filter=lfs diff=lfs merge=lfs -text +*.bz2 filter=lfs diff=lfs merge=lfs -text +*.ckpt filter=lfs diff=lfs merge=lfs -text +*.ftz filter=lfs diff=lfs merge=lfs -text +*.gz filter=lfs diff=lfs merge=lfs -text +*.h5 filter=lfs diff=lfs merge=lfs -text +*.joblib filter=lfs diff=lfs merge=lfs -text +*.lfs.* filter=lfs diff=lfs merge=lfs -text +*.mlmodel filter=lfs diff=lfs merge=lfs -text +*.model filter=lfs diff=lfs merge=lfs -text +*.msgpack filter=lfs diff=lfs merge=lfs -text +*.npy filter=lfs diff=lfs merge=lfs -text +*.npz filter=lfs diff=lfs merge=lfs -text +*.onnx filter=lfs diff=lfs merge=lfs -text +*.ot filter=lfs diff=lfs merge=lfs -text +*.parquet filter=lfs diff=lfs merge=lfs -text +*.pb filter=lfs diff=lfs merge=lfs -text +*.pickle filter=lfs diff=lfs merge=lfs -text +*.pkl filter=lfs diff=lfs merge=lfs -text +*.pt filter=lfs diff=lfs merge=lfs -text +*.pth filter=lfs diff=lfs merge=lfs -text +*.rar filter=lfs diff=lfs merge=lfs -text +*.safetensors filter=lfs diff=lfs merge=lfs -text +saved_model/**/* filter=lfs diff=lfs merge=lfs -text +*.tar.* filter=lfs diff=lfs merge=lfs -text +*.tar filter=lfs diff=lfs merge=lfs -text +*.tflite filter=lfs diff=lfs merge=lfs -text +*.tgz filter=lfs diff=lfs merge=lfs -text +*.wasm filter=lfs diff=lfs merge=lfs -text +*.xz filter=lfs diff=lfs merge=lfs -text +*.zip filter=lfs diff=lfs merge=lfs -text +*.zst filter=lfs diff=lfs merge=lfs -text +*tfevents* filter=lfs diff=lfs merge=lfs -text +FlameDesigner-Qwen2.5-3B-v1.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text +FlameDesigner-Qwen2.5-3B-v1.Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text +FlameDesigner-Qwen2.5-3B-v1.Q8_0.gguf filter=lfs diff=lfs merge=lfs -text +FlameDesigner-Qwen2.5-3B-v1.f16.gguf filter=lfs diff=lfs merge=lfs -text diff --git a/FlameDesigner-Qwen2.5-3B-v1.Q4_K_M.gguf b/FlameDesigner-Qwen2.5-3B-v1.Q4_K_M.gguf new file mode 100644 index 0000000..c6463fd --- /dev/null +++ b/FlameDesigner-Qwen2.5-3B-v1.Q4_K_M.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c1ae7c9ba6031cbbf0fa8f8b8f563c9215c2444c52a8bad289276f1abcc58e54 +size 1929903104 diff --git a/FlameDesigner-Qwen2.5-3B-v1.Q5_K_M.gguf b/FlameDesigner-Qwen2.5-3B-v1.Q5_K_M.gguf new file mode 100644 index 0000000..d335f5c --- /dev/null +++ b/FlameDesigner-Qwen2.5-3B-v1.Q5_K_M.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:44703edb7421fdae97bcba603bdbba077fd51acc28354ffc8d29c74f9b7e5327 +size 2224815104 diff --git a/FlameDesigner-Qwen2.5-3B-v1.Q8_0.gguf b/FlameDesigner-Qwen2.5-3B-v1.Q8_0.gguf new file mode 100644 index 0000000..e68c8f7 --- /dev/null +++ b/FlameDesigner-Qwen2.5-3B-v1.Q8_0.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b957b50f00c4e9dff19681a0653a09a664411ced93645642cbec6b38d19ff4e0 +size 3285476352 diff --git a/FlameDesigner-Qwen2.5-3B-v1.f16.gguf b/FlameDesigner-Qwen2.5-3B-v1.f16.gguf new file mode 100644 index 0000000..f9fd9c7 --- /dev/null +++ b/FlameDesigner-Qwen2.5-3B-v1.f16.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cb81cd4cd1ab31ebbae4e6118d4eb300e4b8f5c17d1ef172968678b56f857c7b +size 6178317312 diff --git a/README.md b/README.md new file mode 100644 index 0000000..4e68a6f --- /dev/null +++ b/README.md @@ -0,0 +1,107 @@ +--- +license: mit +base_model: flammenai/FlameDesigner-Qwen2.5-3B-v1 +base_model_relation: quantized +library_name: gguf +tags: +- character-design +- json +- structured-output +- flammen.ai +- gguf +quantized_by: flammenai +language: +- en +pipeline_tag: text-generation +--- + +# FlameDesigner-Qwen2.5-3B-v1-GGUF + +GGUF quantizations of [`flammenai/FlameDesigner-Qwen2.5-3B-v1`](https://huggingface.co/flammenai/FlameDesigner-Qwen2.5-3B-v1) — a Qwen2.5-3B-Instruct LoRA finetune that turns a free-text seed (e.g. `"samurai"`, `"Mongolian falconer"`) into a strict-schema JSON character design for [flammen.ai](https://huggingface.co/flammenai)'s Create-a-Flame pipeline. + +Trained on [`flammenai/flame-kindling-v1`](https://huggingface.co/datasets/flammenai/flame-kindling-v1) (400 SFT rows distilled from Claude Sonnet 4.5). + +## Files + +| Quant | Size | Notes | +|---|---|---| +| `FlameDesigner-Qwen2.5-3B-v1.f16.gguf` | 5.8 GB | Source for further quantization | +| `FlameDesigner-Qwen2.5-3B-v1.Q8_0.gguf` | 3.1 GB | **Recommended.** Best strict-schema compliance in our eval; near-F16 quality at half the size. | +| `FlameDesigner-Qwen2.5-3B-v1.Q5_K_M.gguf` | 2.1 GB | Compromise between Q8 and Q4. | +| `FlameDesigner-Qwen2.5-3B-v1.Q4_K_M.gguf` | 1.8 GB | Smallest practical CPU quant. Strict-schema pass rate drops noticeably (see eval); use the auto-repair shim. | + +## Inference + +### llama.cpp / llama-server + +```bash +llama-server -m FlameDesigner-Qwen2.5-3B-v1.Q4_K_M.gguf \ + --host 127.0.0.1 --port 8081 -c 8192 --jinja +``` + +Then `POST /v1/chat/completions` with the [`flame_dataset.GOLD_SYSTEM`](https://huggingface.co/datasets/flammenai/flame-kindling-v1) system prompt and the seed as the user message. Output is a single JSON object matching the `DesignedFlame` schema (or close — see "Auto-repair shim" below). + +### Example + +```python +import requests, json +SYSTEM = open("GOLD_SYSTEM.txt").read() # from the dataset card / FlameKindling repo +r = requests.post("http://127.0.0.1:8081/v1/chat/completions", json={ + "messages": [ + {"role": "system", "content": SYSTEM}, + {"role": "user", "content": "Mongolian falconer"}, + ], + "max_tokens": 2048, + "temperature": 0.7, +}) +text = r.json()["choices"][0]["message"]["content"] +print(json.loads(text)) +``` + +## Eval + +20 held-out seeds (no overlap with training data, mix of one-word + sentence + paragraph). Inference at `temperature=0.7`, GPU offload (`-ngl 999`) on an A6000. Per-output coherence judged by Qwen3.5-27B (1-5 scale, lenient at the high end). + +| Quant | Avg latency | Strict pass | Soft pass (after auto-repair) | +|---|---|---|---| +| **Q8_0** | 3.1 s | **15/20 (75%)** | 19/20 (95%) | +| F16 | 5.1 s | 13/20 (65%) | 20/20 (100%) | +| Q4_K_M | 2.2 s | 7/20 (35%) | 19/20 (95%) | + +Quantization noticeably affects strict-schema compliance — Q4 loses ~half the strict pass rate vs Q8. The soft-pass numbers (after the auto-repair shim below) are within rounding distance for all three. **Recommendation: Q8_0 in production, with the shim regardless.** + +Coherence on valid outputs is consistently 4.5-5.0 across all quants — when the model produces a parseable design, the design is good. The strict failures are **1-off constraint violations**, not quality problems: +- `writing_style` arrays with 5 entries instead of max 4 (trim to 4) +- `languages` containing codes outside the [SUPPORTED_LANGUAGES](https://huggingface.co/datasets/flammenai/flame-kindling-v1) allow-list (e.g. `mn`, `cy`, `mi`, `sq` — Qwen2.5-3B knows these from base training; the LoRA didn't fully suppress them) +- `system_prompt_extra` over 512 chars (truncate) +- Rare: output truncated by max_tokens (use `max_tokens >= 2048`) + +## Auto-repair shim + +Production integration in FlameGen wraps the model with this shim before validating against `DesignedFlame`: + +```python +def autorepair(obj: dict) -> dict: + if isinstance(obj.get("writing_style"), list): + obj["writing_style"] = obj["writing_style"][:4] + if isinstance(obj.get("languages"), list): + obj["languages"] = [c for c in obj["languages"] if c in SUPPORTED_LANGUAGES] + if not obj["languages"]: + obj["languages"] = ["en"] + if isinstance(obj.get("system_prompt_extra"), str): + obj["system_prompt_extra"] = obj["system_prompt_extra"][:512].rstrip() + return obj +``` + +Recovers ~60% of strict-failures, lifts effective pass rate from 35% to 95% with zero quality cost (the trimmed entries are themselves on-character — model just over-produced). + +## Limitations + +- **Small training set (400 rows).** Schema constraint violations above are likely from the small dataset + rank-128 LoRA over-capacity ratio. A v2 with more data should improve hard-pass. +- **Schema drift on language allow-list.** Base Qwen knows codes outside `SUPPORTED_LANGUAGES`; the LoRA inherits this. The auto-repair shim handles it. +- **Verbose `system_prompt_extra`.** Sometimes overshoots the 512-char cap — relax to 600 or apply the shim. +- **No NSFW.** Training data was Sonnet-distilled; Sonnet declines explicit traits. NSFW Create-a-Flame is deferred in flammen.ai anyway. + +## License + +MIT