From 1b197e8f71de10c95a0605b2fa828fdc831d8fad Mon Sep 17 00:00:00 2001 From: ModelHub XC Date: Sat, 6 Jun 2026 09:54:15 +0800 Subject: [PATCH] =?UTF-8?q?=E5=88=9D=E5=A7=8B=E5=8C=96=E9=A1=B9=E7=9B=AE?= =?UTF-8?q?=EF=BC=8C=E7=94=B1ModelHub=20XC=E7=A4=BE=E5=8C=BA=E6=8F=90?= =?UTF-8?q?=E4=BE=9B=E6=A8=A1=E5=9E=8B?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Model: Cyb3RQ/arabic-poetry-qwen3-8b-GGUF Source: Original Platform --- .gitattributes | 36 ++++++++++++++ README.md | 84 +++++++++++++++++++++++++++++++++ arabic-poetry-qwen3-8b-f16.gguf | 3 ++ 3 files changed, 123 insertions(+) create mode 100644 .gitattributes create mode 100644 README.md create mode 100644 arabic-poetry-qwen3-8b-f16.gguf diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..1e2d2eb --- /dev/null +++ b/.gitattributes @@ -0,0 +1,36 @@ +*.7z filter=lfs diff=lfs merge=lfs -text +*.arrow filter=lfs diff=lfs merge=lfs -text +*.bin filter=lfs diff=lfs merge=lfs -text +*.bz2 filter=lfs diff=lfs merge=lfs -text +*.ckpt filter=lfs diff=lfs merge=lfs -text +*.ftz filter=lfs diff=lfs merge=lfs -text +*.gz filter=lfs diff=lfs merge=lfs -text +*.h5 filter=lfs diff=lfs merge=lfs -text +*.joblib filter=lfs diff=lfs merge=lfs -text +*.lfs.* filter=lfs diff=lfs merge=lfs -text +*.mlmodel filter=lfs diff=lfs merge=lfs -text +*.model filter=lfs diff=lfs merge=lfs -text +*.msgpack filter=lfs diff=lfs merge=lfs -text +*.npy filter=lfs diff=lfs merge=lfs -text +*.npz filter=lfs diff=lfs merge=lfs -text +*.onnx filter=lfs diff=lfs merge=lfs -text +*.ot filter=lfs diff=lfs merge=lfs -text +*.parquet filter=lfs diff=lfs merge=lfs -text +*.pb filter=lfs diff=lfs merge=lfs -text +*.pickle filter=lfs diff=lfs merge=lfs -text +*.pkl filter=lfs diff=lfs merge=lfs -text +*.pt filter=lfs diff=lfs merge=lfs -text +*.pth filter=lfs diff=lfs merge=lfs -text +*.rar filter=lfs diff=lfs merge=lfs -text +*.safetensors filter=lfs diff=lfs merge=lfs -text +saved_model/**/* filter=lfs diff=lfs merge=lfs -text +*.tar.* filter=lfs diff=lfs merge=lfs -text +*.tar filter=lfs diff=lfs merge=lfs -text +*.tflite filter=lfs diff=lfs merge=lfs -text +*.tgz filter=lfs diff=lfs merge=lfs -text +*.wasm filter=lfs diff=lfs merge=lfs -text +*.xz filter=lfs diff=lfs merge=lfs -text +*.zip filter=lfs diff=lfs merge=lfs -text +*.zst filter=lfs diff=lfs merge=lfs -text +*tfevents* filter=lfs diff=lfs merge=lfs -text +arabic-poetry-qwen3-8b-f16.gguf filter=lfs diff=lfs merge=lfs -text diff --git a/README.md b/README.md new file mode 100644 index 0000000..458836d --- /dev/null +++ b/README.md @@ -0,0 +1,84 @@ +--- +license: apache-2.0 +language: +- ar +base_model: unsloth/Qwen3-8B +tags: +- arabic +- poetry +- qwen3 +- gguf +- lora +- experimental +pipeline_tag: text-generation +--- + +# Arabic Poetry Qwen3-8B (LoRA, GGUF) — Experimental + +> **Status: experimental / hobby project.** This model produces *coherent +> Arabic in a loose poetic register*. It does **not** reliably produce correct +> classical meter (بحر) or rhyme (قافية), and output quality is uneven. Set +> expectations accordingly. Read the Limitations section before using. + +A LoRA fine-tune of **Qwen3-8B** on a corpus of Arabic poetry that was OCR'd +locally from 19 books. Trained as continued-pretraining on discrete poems. + +## What it actually does + +- Generates Arabic text with a poem-like shape (short lines, stops cleanly) +- Stays in Arabic, on the prompt's theme more than the base model does +- Style leans modern/free-verse (Darwish-ish), not classical ode + +## What it does NOT do well + +- **No reliable meter or rhyme.** It does not scan to a specific بحر. +- Imagery is often weak or vague; some lines are semantically loose. +- Quality varies a lot run-to-run. +- Classical/Jahiliyya register is weak (the OCR corpus had artifacts). + +This is a **data-and-scale-limited** result: ~2,244 OCR'd poems (with residual +OCR noise) on an 8B model is not enough to install genuine Arabic prosody. It +is shared as an experiment and a starting point, not a finished poetry engine. + +## Usage (LM Studio / llama.cpp) + +Download `arabic-poetry-qwen3-8b-f16.gguf`. Recommended sampling: + +- Temperature `0.7`, Top-p `0.92`, Top-k `40` +- **Repeat penalty `1.3`** (lower values loop) +- Reasoning/thinking: **OFF** +- System prompt: empty + +Prompt with Arabic openers, not English instructions: + +``` +قصيدة في وصف الصحراء: +في حضرة الغياب، +أحبكِ يا وطني، +``` + +> Note: send prompts via a UTF-8-correct client. Some terminal/curl setups on +> Windows mangle Arabic UTF-8 and will make the model emit garbage — that is a +> client encoding bug, not the model. + +## Training + +| | | +|---|---| +| Base | `unsloth/Qwen3-8B` (full bf16, not quantized) | +| Method | LoRA r=32 α=32, attention-only, dropout 0.05 | +| Data | 2,244 discrete cleaned Arabic poems, EOS-terminated | +| Schedule | 2 epochs, cosine LR 1.2e-4, manual training loop | +| Selection | best pre-overfit checkpoint by sample quality (not final) | +| Hardware | single RTX 4090, ~19 min | + +## License + +Apache-2.0 (inherits from Qwen3-8B). + +## Honest note + +Built end-to-end on a single workstation (OCR → corpus cleaning → LoRA → GGUF). +The most reusable artifact from the project is arguably the cleaned corpus and +pipeline, not this particular adapter. Contributions / a cleaner meter-labelled +Arabic corpus would meaningfully improve a v2. diff --git a/arabic-poetry-qwen3-8b-f16.gguf b/arabic-poetry-qwen3-8b-f16.gguf new file mode 100644 index 0000000..f30e582 --- /dev/null +++ b/arabic-poetry-qwen3-8b-f16.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:473c29229e13e3b29fbb662a371a165c5123be49abfe9fef7d02673a87966e94 +size 16388044192