--- license: apache-2.0 base_model: Qwen/Qwen3-4B-Instruct-2507 tags: - roleplay - creative-writing - sft - qwen3 datasets: - flammenai/flame-kindling-v1 language: - en pipeline_tag: text-generation library_name: transformers --- # qwen-4b-2507-rp-mahou A full-parameter SFT of [`Qwen/Qwen3-4B-Instruct-2507`](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) on [`flammenai/flame-kindling-v1`](https://huggingface.co/datasets/flammenai/flame-kindling-v1) for creative roleplay and character interaction. ## Highlights - Base: Qwen3-4B-Instruct-2507 - Method: full-sequence SFT (no LoRA) - Dataset: flame-kindling-v1 (RP / creative writing) - Precision: bf16 - Chat template: Qwen3 (use `enable_thinking=False` for RP) ## Usage ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer MODEL_ID = "Pranavz/qwen-4b-2507-rp-mahou" tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) model = AutoModelForCausalLM.from_pretrained( MODEL_ID, torch_dtype=torch.bfloat16, device_map="auto", ) messages = [ {"role": "system", "content": "You are a creative roleplay assistant. Stay in character, write vividly, and use asterisks for actions."}, {"role": "user", "content": "*walks into the tavern, shaking off the rain* Evening, barkeep. Got a room?"}, ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=False, ) inputs = tokenizer(text, return_tensors="pt").to(model.device) with torch.inference_mode(): out = model.generate( **inputs, max_new_tokens=512, temperature=0.8, top_p=0.9, top_k=40, repetition_penalty=1.1, do_sample=True, ) print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)) ``` ## Recommended sampler settings | Parameter | Value | Notes | |---|---|---| | `temperature` | 0.7 – 0.85 | creative without going off-rails | | `top_p` | 0.9 | trim the long tail | | `top_k` | 40 | hard vocab cap | | `min_p` | 0.05 | optional, often nicer than top_p alone | | `repetition_penalty` | 1.05 – 1.15 | RP models love loops — kill them | | `max_new_tokens` | 512 – 1024 | RP needs room | Always pass `enable_thinking=False` to the chat template — RP doesn't want CoT. ## Limitations - Trained on a single curated RP dataset; expect a particular tone (vivid, action-asterisk style) - Not safety-tuned beyond what the base model provides - English only ## Acknowledgements - Base model: [Qwen team](https://huggingface.co/Qwen) - Dataset: [flammenai/flame-kindling-v1](https://huggingface.co/datasets/flammenai/flame-kindling-v1)