--- license: apache-2.0 language: - ar base_model: unsloth/Qwen3-8B tags: - arabic - poetry - qwen3 - gguf - lora - experimental pipeline_tag: text-generation --- # Arabic Poetry Qwen3-8B (LoRA, GGUF) — Experimental > **Status: experimental / hobby project.** This model produces *coherent > Arabic in a loose poetic register*. It does **not** reliably produce correct > classical meter (بحر) or rhyme (قافية), and output quality is uneven. Set > expectations accordingly. Read the Limitations section before using. A LoRA fine-tune of **Qwen3-8B** on a corpus of Arabic poetry that was OCR'd locally from 19 books. Trained as continued-pretraining on discrete poems. ## What it actually does - Generates Arabic text with a poem-like shape (short lines, stops cleanly) - Stays in Arabic, on the prompt's theme more than the base model does - Style leans modern/free-verse (Darwish-ish), not classical ode ## What it does NOT do well - **No reliable meter or rhyme.** It does not scan to a specific بحر. - Imagery is often weak or vague; some lines are semantically loose. - Quality varies a lot run-to-run. - Classical/Jahiliyya register is weak (the OCR corpus had artifacts). This is a **data-and-scale-limited** result: ~2,244 OCR'd poems (with residual OCR noise) on an 8B model is not enough to install genuine Arabic prosody. It is shared as an experiment and a starting point, not a finished poetry engine. ## Usage (LM Studio / llama.cpp) Download `arabic-poetry-qwen3-8b-f16.gguf`. Recommended sampling: - Temperature `0.7`, Top-p `0.92`, Top-k `40` - **Repeat penalty `1.3`** (lower values loop) - Reasoning/thinking: **OFF** - System prompt: empty Prompt with Arabic openers, not English instructions: ``` قصيدة في وصف الصحراء: في حضرة الغياب، أحبكِ يا وطني، ``` > Note: send prompts via a UTF-8-correct client. Some terminal/curl setups on > Windows mangle Arabic UTF-8 and will make the model emit garbage — that is a > client encoding bug, not the model. ## Training | | | |---|---| | Base | `unsloth/Qwen3-8B` (full bf16, not quantized) | | Method | LoRA r=32 α=32, attention-only, dropout 0.05 | | Data | 2,244 discrete cleaned Arabic poems, EOS-terminated | | Schedule | 2 epochs, cosine LR 1.2e-4, manual training loop | | Selection | best pre-overfit checkpoint by sample quality (not final) | | Hardware | single RTX 4090, ~19 min | ## License Apache-2.0 (inherits from Qwen3-8B). ## Honest note Built end-to-end on a single workstation (OCR → corpus cleaning → LoRA → GGUF). The most reusable artifact from the project is arguably the cleaned corpus and pipeline, not this particular adapter. Contributions / a cleaner meter-labelled Arabic corpus would meaningfully improve a v2.