qwen-4b-2507-rp-mahou/README.md

---
license: apache-2.0
base_model: Qwen/Qwen3-4B-Instruct-2507
tags:
  - roleplay
  - creative-writing
  - sft
  - qwen3
datasets:
  - flammenai/flame-kindling-v1
language:
  - en
pipeline_tag: text-generation
library_name: transformers
---

# qwen-4b-2507-rp-mahou

A full-parameter SFT of [`Qwen/Qwen3-4B-Instruct-2507`](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) on [`flammenai/flame-kindling-v1`](https://huggingface.co/datasets/flammenai/flame-kindling-v1) for creative roleplay and character interaction.

## Highlights

- Base: Qwen3-4B-Instruct-2507
- Method: full-sequence SFT (no LoRA)
- Dataset: flame-kindling-v1 (RP / creative writing)
- Precision: bf16
- Chat template: Qwen3 (use `enable_thinking=False` for RP)

## Usage

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_ID = "Pranavz/qwen-4b-2507-rp-mahou"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a creative roleplay assistant. Stay in character, write vividly, and use asterisks for actions."},
    {"role": "user", "content": "*walks into the tavern, shaking off the rain* Evening, barkeep. Got a room?"},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.inference_mode():
    out = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.8,
        top_p=0.9,
        top_k=40,
        repetition_penalty=1.1,
        do_sample=True,
    )

print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
```

## Recommended sampler settings

| Parameter | Value | Notes |
|---|---|---|
| `temperature` | 0.7 – 0.85 | creative without going off-rails |
| `top_p` | 0.9 | trim the long tail |
| `top_k` | 40 | hard vocab cap |
| `min_p` | 0.05 | optional, often nicer than top_p alone |
| `repetition_penalty` | 1.05 – 1.15 | RP models love loops — kill them |
| `max_new_tokens` | 512 – 1024 | RP needs room |

Always pass `enable_thinking=False` to the chat template — RP doesn't want CoT.


## Limitations

- Trained on a single curated RP dataset; expect a particular tone (vivid, action-asterisk style)
- Not safety-tuned beyond what the base model provides
- English only

## Acknowledgements

- Base model: [Qwen team](https://huggingface.co/Qwen)
- Dataset: [flammenai/flame-kindling-v1](https://huggingface.co/datasets/flammenai/flame-kindling-v1)