ModelHub XC 1b197e8f71 初始化项目,由ModelHub XC社区提供模型
Model: Cyb3RQ/arabic-poetry-qwen3-8b-GGUF
Source: Original Platform
2026-06-06 09:54:15 +08:00

license, language, base_model, tags, pipeline_tag
license language base_model tags pipeline_tag
apache-2.0
ar
unsloth/Qwen3-8B
arabic
poetry
qwen3
gguf
lora
experimental
text-generation

Arabic Poetry Qwen3-8B (LoRA, GGUF) — Experimental

Status: experimental / hobby project. This model produces coherent Arabic in a loose poetic register. It does not reliably produce correct classical meter (بحر) or rhyme (قافية), and output quality is uneven. Set expectations accordingly. Read the Limitations section before using.

A LoRA fine-tune of Qwen3-8B on a corpus of Arabic poetry that was OCR'd locally from 19 books. Trained as continued-pretraining on discrete poems.

What it actually does

  • Generates Arabic text with a poem-like shape (short lines, stops cleanly)
  • Stays in Arabic, on the prompt's theme more than the base model does
  • Style leans modern/free-verse (Darwish-ish), not classical ode

What it does NOT do well

  • No reliable meter or rhyme. It does not scan to a specific بحر.
  • Imagery is often weak or vague; some lines are semantically loose.
  • Quality varies a lot run-to-run.
  • Classical/Jahiliyya register is weak (the OCR corpus had artifacts).

This is a data-and-scale-limited result: ~2,244 OCR'd poems (with residual OCR noise) on an 8B model is not enough to install genuine Arabic prosody. It is shared as an experiment and a starting point, not a finished poetry engine.

Usage (LM Studio / llama.cpp)

Download arabic-poetry-qwen3-8b-f16.gguf. Recommended sampling:

  • Temperature 0.7, Top-p 0.92, Top-k 40
  • Repeat penalty 1.3 (lower values loop)
  • Reasoning/thinking: OFF
  • System prompt: empty

Prompt with Arabic openers, not English instructions:

قصيدة في وصف الصحراء:
في حضرة الغياب،
أحبكِ يا وطني،

Note: send prompts via a UTF-8-correct client. Some terminal/curl setups on Windows mangle Arabic UTF-8 and will make the model emit garbage — that is a client encoding bug, not the model.

Training

Base unsloth/Qwen3-8B (full bf16, not quantized)
Method LoRA r=32 α=32, attention-only, dropout 0.05
Data 2,244 discrete cleaned Arabic poems, EOS-terminated
Schedule 2 epochs, cosine LR 1.2e-4, manual training loop
Selection best pre-overfit checkpoint by sample quality (not final)
Hardware single RTX 4090, ~19 min

License

Apache-2.0 (inherits from Qwen3-8B).

Honest note

Built end-to-end on a single workstation (OCR → corpus cleaning → LoRA → GGUF). The most reusable artifact from the project is arguably the cleaned corpus and pipeline, not this particular adapter. Contributions / a cleaner meter-labelled Arabic corpus would meaningfully improve a v2.

Description
Model synced from source: Cyb3RQ/arabic-poetry-qwen3-8b-GGUF
Readme 25 KiB