初始化项目,由ModelHub XC社区提供模型
Model: flammenai/FlameDesigner-Qwen2.5-3B-v1-GGUF Source: Original Platform
This commit is contained in:
39
.gitattributes
vendored
Normal file
39
.gitattributes
vendored
Normal file
@@ -0,0 +1,39 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
FlameDesigner-Qwen2.5-3B-v1.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
FlameDesigner-Qwen2.5-3B-v1.Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
FlameDesigner-Qwen2.5-3B-v1.Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
FlameDesigner-Qwen2.5-3B-v1.f16.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
3
FlameDesigner-Qwen2.5-3B-v1.Q4_K_M.gguf
Normal file
3
FlameDesigner-Qwen2.5-3B-v1.Q4_K_M.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c1ae7c9ba6031cbbf0fa8f8b8f563c9215c2444c52a8bad289276f1abcc58e54
|
||||
size 1929903104
|
||||
3
FlameDesigner-Qwen2.5-3B-v1.Q5_K_M.gguf
Normal file
3
FlameDesigner-Qwen2.5-3B-v1.Q5_K_M.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:44703edb7421fdae97bcba603bdbba077fd51acc28354ffc8d29c74f9b7e5327
|
||||
size 2224815104
|
||||
3
FlameDesigner-Qwen2.5-3B-v1.Q8_0.gguf
Normal file
3
FlameDesigner-Qwen2.5-3B-v1.Q8_0.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:b957b50f00c4e9dff19681a0653a09a664411ced93645642cbec6b38d19ff4e0
|
||||
size 3285476352
|
||||
3
FlameDesigner-Qwen2.5-3B-v1.f16.gguf
Normal file
3
FlameDesigner-Qwen2.5-3B-v1.f16.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:cb81cd4cd1ab31ebbae4e6118d4eb300e4b8f5c17d1ef172968678b56f857c7b
|
||||
size 6178317312
|
||||
107
README.md
Normal file
107
README.md
Normal file
@@ -0,0 +1,107 @@
|
||||
---
|
||||
license: mit
|
||||
base_model: flammenai/FlameDesigner-Qwen2.5-3B-v1
|
||||
base_model_relation: quantized
|
||||
library_name: gguf
|
||||
tags:
|
||||
- character-design
|
||||
- json
|
||||
- structured-output
|
||||
- flammen.ai
|
||||
- gguf
|
||||
quantized_by: flammenai
|
||||
language:
|
||||
- en
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# FlameDesigner-Qwen2.5-3B-v1-GGUF
|
||||
|
||||
GGUF quantizations of [`flammenai/FlameDesigner-Qwen2.5-3B-v1`](https://huggingface.co/flammenai/FlameDesigner-Qwen2.5-3B-v1) — a Qwen2.5-3B-Instruct LoRA finetune that turns a free-text seed (e.g. `"samurai"`, `"Mongolian falconer"`) into a strict-schema JSON character design for [flammen.ai](https://huggingface.co/flammenai)'s Create-a-Flame pipeline.
|
||||
|
||||
Trained on [`flammenai/flame-kindling-v1`](https://huggingface.co/datasets/flammenai/flame-kindling-v1) (400 SFT rows distilled from Claude Sonnet 4.5).
|
||||
|
||||
## Files
|
||||
|
||||
| Quant | Size | Notes |
|
||||
|---|---|---|
|
||||
| `FlameDesigner-Qwen2.5-3B-v1.f16.gguf` | 5.8 GB | Source for further quantization |
|
||||
| `FlameDesigner-Qwen2.5-3B-v1.Q8_0.gguf` | 3.1 GB | **Recommended.** Best strict-schema compliance in our eval; near-F16 quality at half the size. |
|
||||
| `FlameDesigner-Qwen2.5-3B-v1.Q5_K_M.gguf` | 2.1 GB | Compromise between Q8 and Q4. |
|
||||
| `FlameDesigner-Qwen2.5-3B-v1.Q4_K_M.gguf` | 1.8 GB | Smallest practical CPU quant. Strict-schema pass rate drops noticeably (see eval); use the auto-repair shim. |
|
||||
|
||||
## Inference
|
||||
|
||||
### llama.cpp / llama-server
|
||||
|
||||
```bash
|
||||
llama-server -m FlameDesigner-Qwen2.5-3B-v1.Q4_K_M.gguf \
|
||||
--host 127.0.0.1 --port 8081 -c 8192 --jinja
|
||||
```
|
||||
|
||||
Then `POST /v1/chat/completions` with the [`flame_dataset.GOLD_SYSTEM`](https://huggingface.co/datasets/flammenai/flame-kindling-v1) system prompt and the seed as the user message. Output is a single JSON object matching the `DesignedFlame` schema (or close — see "Auto-repair shim" below).
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
import requests, json
|
||||
SYSTEM = open("GOLD_SYSTEM.txt").read() # from the dataset card / FlameKindling repo
|
||||
r = requests.post("http://127.0.0.1:8081/v1/chat/completions", json={
|
||||
"messages": [
|
||||
{"role": "system", "content": SYSTEM},
|
||||
{"role": "user", "content": "Mongolian falconer"},
|
||||
],
|
||||
"max_tokens": 2048,
|
||||
"temperature": 0.7,
|
||||
})
|
||||
text = r.json()["choices"][0]["message"]["content"]
|
||||
print(json.loads(text))
|
||||
```
|
||||
|
||||
## Eval
|
||||
|
||||
20 held-out seeds (no overlap with training data, mix of one-word + sentence + paragraph). Inference at `temperature=0.7`, GPU offload (`-ngl 999`) on an A6000. Per-output coherence judged by Qwen3.5-27B (1-5 scale, lenient at the high end).
|
||||
|
||||
| Quant | Avg latency | Strict pass | Soft pass (after auto-repair) |
|
||||
|---|---|---|---|
|
||||
| **Q8_0** | 3.1 s | **15/20 (75%)** | 19/20 (95%) |
|
||||
| F16 | 5.1 s | 13/20 (65%) | 20/20 (100%) |
|
||||
| Q4_K_M | 2.2 s | 7/20 (35%) | 19/20 (95%) |
|
||||
|
||||
Quantization noticeably affects strict-schema compliance — Q4 loses ~half the strict pass rate vs Q8. The soft-pass numbers (after the auto-repair shim below) are within rounding distance for all three. **Recommendation: Q8_0 in production, with the shim regardless.**
|
||||
|
||||
Coherence on valid outputs is consistently 4.5-5.0 across all quants — when the model produces a parseable design, the design is good. The strict failures are **1-off constraint violations**, not quality problems:
|
||||
- `writing_style` arrays with 5 entries instead of max 4 (trim to 4)
|
||||
- `languages` containing codes outside the [SUPPORTED_LANGUAGES](https://huggingface.co/datasets/flammenai/flame-kindling-v1) allow-list (e.g. `mn`, `cy`, `mi`, `sq` — Qwen2.5-3B knows these from base training; the LoRA didn't fully suppress them)
|
||||
- `system_prompt_extra` over 512 chars (truncate)
|
||||
- Rare: output truncated by max_tokens (use `max_tokens >= 2048`)
|
||||
|
||||
## Auto-repair shim
|
||||
|
||||
Production integration in FlameGen wraps the model with this shim before validating against `DesignedFlame`:
|
||||
|
||||
```python
|
||||
def autorepair(obj: dict) -> dict:
|
||||
if isinstance(obj.get("writing_style"), list):
|
||||
obj["writing_style"] = obj["writing_style"][:4]
|
||||
if isinstance(obj.get("languages"), list):
|
||||
obj["languages"] = [c for c in obj["languages"] if c in SUPPORTED_LANGUAGES]
|
||||
if not obj["languages"]:
|
||||
obj["languages"] = ["en"]
|
||||
if isinstance(obj.get("system_prompt_extra"), str):
|
||||
obj["system_prompt_extra"] = obj["system_prompt_extra"][:512].rstrip()
|
||||
return obj
|
||||
```
|
||||
|
||||
Recovers ~60% of strict-failures, lifts effective pass rate from 35% to 95% with zero quality cost (the trimmed entries are themselves on-character — model just over-produced).
|
||||
|
||||
## Limitations
|
||||
|
||||
- **Small training set (400 rows).** Schema constraint violations above are likely from the small dataset + rank-128 LoRA over-capacity ratio. A v2 with more data should improve hard-pass.
|
||||
- **Schema drift on language allow-list.** Base Qwen knows codes outside `SUPPORTED_LANGUAGES`; the LoRA inherits this. The auto-repair shim handles it.
|
||||
- **Verbose `system_prompt_extra`.** Sometimes overshoots the 512-char cap — relax to 600 or apply the shim.
|
||||
- **No NSFW.** Training data was Sonnet-distilled; Sonnet declines explicit traits. NSFW Create-a-Flame is deferred in flammen.ai anyway.
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
Reference in New Issue
Block a user