Files
flowscribe-qwen2.5-0.5b/README.md
ModelHub XC f494216a15 初始化项目,由ModelHub XC社区提供模型
Model: Abdullahu5mani/flowscribe-qwen2.5-0.5b
Source: Original Platform
2026-05-06 07:44:45 +08:00

188 lines
5.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
language:
- en
license: mit
base_model: Qwen/Qwen2.5-0.5B-Instruct
tags:
- text-generation
- fine-tuned
- lora
- gguf
- speech-to-text
- text-cleanup
- unsloth
- qwen2
pipeline_tag: text-generation
datasets:
- Abdullahu5mani/flowscribe-dataset
---
# FlowScribe — Qwen2.5-0.5B Speech Transcript Formatter
A fine-tuned version of [Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) that converts raw, messy speech-to-text output into clean, formatted text across multiple writing styles.
**GitHub:** [github.com/Abdullahu5mani/flowscribe](https://github.com/Abdullahu5mani/flowscribe)
---
## The Problem
Voice dictation tools like Whisper produce transcripts full of filler words (`um`, `uh`, `like`), self-corrections (`make it 5... no wait, 6`), and no punctuation or formatting. This model post-processes those transcripts into polished text, with awareness of the desired output style.
---
## Styles
| Style | Behavior |
|---|---|
| `Auto` | Intelligent default — removes fillers, fixes grammar, handles self-corrections, applies structure |
| `Professional` | Formal business tone, structured layout, perfect grammar |
| `Casual` | Keeps the speaker's voice, light cleanup, contractions preserved |
| `Verbatim` | Preserves exact wording, only strips `um`/`uh` and applies spoken formatting commands |
| `Software_Dev` | Formats code terms, variable names (`camelCase`, `snake_case`), technical jargon |
| `Enthusiastic` | High energy, exclamation marks, positive phrasing |
---
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "Abdullahu5mani/flowscribe-qwen2.5-0.5b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
def format_transcript(raw_text, style="Auto"):
messages = [
{
"role": "system",
"content": "You are a helpful assistant that transcribes and formats text based on a specific style instruction."
},
{
"role": "user",
"content": f"Transcribe and format this with style: {style}\nInput: {raw_text}"
}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
output_ids = outputs[0][len(inputs.input_ids[0]):]
return tokenizer.decode(output_ids, skip_special_tokens=True)
# Examples
print(format_transcript(
"um so the meeting is at 5... no wait make it 6 and uh we need to discuss the q3 budget",
style="Professional"
))
# → "The meeting is at 6 PM to discuss the Q3 budget."
print(format_transcript(
"the api endpoint is slash api slash users new line it takes a POST request with JSON",
style="Software_Dev"
))
# → "The API endpoint is `/api/users`\nIt takes a POST request with JSON."
```
---
## GGUF (Quantized) Usage
A Q4_K_M quantized GGUF version is included in this repository for fast CPU/GPU inference via [llama-cpp-python](https://github.com/abetlen/llama-cpp-python).
```python
from llama_cpp import Llama
llm = Llama(
model_path="model_q4_k_m.gguf",
n_ctx=2048,
n_gpu_layers=-1, # Set to 0 for CPU-only
verbose=False
)
response = llm.create_chat_completion(
messages=[
{
"role": "system",
"content": "You are a helpful assistant that transcribes and formats text based on a specific style instruction."
},
{
"role": "user",
"content": "Transcribe and format this with style: Casual\nInput: hey um so i was thinking we could like grab lunch tomorrow you know around noon ish"
}
],
max_tokens=256,
temperature=0.1,
)
print(response["choices"][0]["message"]["content"])
# → "Hey, I was thinking we could grab lunch tomorrow around noon."
```
---
## Model Details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen2.5-0.5B-Instruct |
| Fine-tuning method | LoRA (via [Unsloth](https://github.com/unslothai/unsloth)) |
| Parameters | ~500M |
| Training epochs | 3 |
| Learning rate | 2e-5 |
| Effective batch size | 16 (batch 2 × grad accumulation 8) |
| Sequence length | 2048 |
| Optimizer | AdamW 8-bit |
| Training hardware | NVIDIA RTX 4070 8GB VRAM |
| Chat template | ChatML |
| Quantization | Q4_K_M (via llama.cpp) |
---
## Training Data
Trained on ~19,800 synthetically generated examples from [flowscribe-dataset](https://huggingface.co/datasets/Abdullahu5mani/flowscribe-dataset).
Each example is an Alpaca-style JSON object:
```json
{
"instruction": "Transcribe and format this with style: Professional",
"input": "um so like the uh proposal is due friday and we need to finalize the, i mean confirm the budget",
"output": "The proposal is due Friday and we need to confirm the budget."
}
```
Data was generated using Google Gemini (primary) and 16 free OpenRouter models (fallback) across 10 domain scenarios: business email, software dev, personal messages, productivity lists, medical notes, and more.
---
## Limitations
- Optimized for English only
- Training data is synthetic — real-world dictation edge cases may vary
- The 0.5B parameter size prioritizes speed and local deployment over raw capability
- Dataset reached ~19.8K examples (target was 50K); further training on more data would improve robustness
---
## Files
| File | Description |
|---|---|
| `model.safetensors` | Full-precision fine-tuned weights |
| `model_q4_k_m.gguf` | Q4_K_M quantized GGUF for llama.cpp |
| `config.json` | Model configuration |
| `tokenizer.json` | Tokenizer |
| `chat_template.jinja` | ChatML chat template |
---
## License
MIT — see [LICENSE](https://github.com/Abdullahu5mani/flowscribe/blob/main/LICENSE)