Model: Cyb3RQ/arabic-poetry-qwen3-8b-GGUF Source: Original Platform
license, language, base_model, tags, pipeline_tag
| license | language | base_model | tags | pipeline_tag | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| apache-2.0 |
|
unsloth/Qwen3-8B |
|
text-generation |
Arabic Poetry Qwen3-8B (LoRA, GGUF) — Experimental
Status: experimental / hobby project. This model produces coherent Arabic in a loose poetic register. It does not reliably produce correct classical meter (بحر) or rhyme (قافية), and output quality is uneven. Set expectations accordingly. Read the Limitations section before using.
A LoRA fine-tune of Qwen3-8B on a corpus of Arabic poetry that was OCR'd locally from 19 books. Trained as continued-pretraining on discrete poems.
What it actually does
- Generates Arabic text with a poem-like shape (short lines, stops cleanly)
- Stays in Arabic, on the prompt's theme more than the base model does
- Style leans modern/free-verse (Darwish-ish), not classical ode
What it does NOT do well
- No reliable meter or rhyme. It does not scan to a specific بحر.
- Imagery is often weak or vague; some lines are semantically loose.
- Quality varies a lot run-to-run.
- Classical/Jahiliyya register is weak (the OCR corpus had artifacts).
This is a data-and-scale-limited result: ~2,244 OCR'd poems (with residual OCR noise) on an 8B model is not enough to install genuine Arabic prosody. It is shared as an experiment and a starting point, not a finished poetry engine.
Usage (LM Studio / llama.cpp)
Download arabic-poetry-qwen3-8b-f16.gguf. Recommended sampling:
- Temperature
0.7, Top-p0.92, Top-k40 - Repeat penalty
1.3(lower values loop) - Reasoning/thinking: OFF
- System prompt: empty
Prompt with Arabic openers, not English instructions:
قصيدة في وصف الصحراء:
في حضرة الغياب،
أحبكِ يا وطني،
Note: send prompts via a UTF-8-correct client. Some terminal/curl setups on Windows mangle Arabic UTF-8 and will make the model emit garbage — that is a client encoding bug, not the model.
Training
| Base | unsloth/Qwen3-8B (full bf16, not quantized) |
| Method | LoRA r=32 α=32, attention-only, dropout 0.05 |
| Data | 2,244 discrete cleaned Arabic poems, EOS-terminated |
| Schedule | 2 epochs, cosine LR 1.2e-4, manual training loop |
| Selection | best pre-overfit checkpoint by sample quality (not final) |
| Hardware | single RTX 4090, ~19 min |
License
Apache-2.0 (inherits from Qwen3-8B).
Honest note
Built end-to-end on a single workstation (OCR → corpus cleaning → LoRA → GGUF). The most reusable artifact from the project is arguably the cleaned corpus and pipeline, not this particular adapter. Contributions / a cleaner meter-labelled Arabic corpus would meaningfully improve a v2.