Chat template: Qwen3 (use enable_thinking=False for RP)
Usage
importtorchfromtransformersimportAutoModelForCausalLM,AutoTokenizerMODEL_ID="Pranavz/qwen-4b-2507-rp-mahou"tokenizer=AutoTokenizer.from_pretrained(MODEL_ID)model=AutoModelForCausalLM.from_pretrained(MODEL_ID,torch_dtype=torch.bfloat16,device_map="auto",)messages=[{"role":"system","content":"You are a creative roleplay assistant. Stay in character, write vividly, and use asterisks for actions."},{"role":"user","content":"*walks into the tavern, shaking off the rain* Evening, barkeep. Got a room?"},]text=tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True,enable_thinking=False,)inputs=tokenizer(text,return_tensors="pt").to(model.device)withtorch.inference_mode():out=model.generate(**inputs,max_new_tokens=512,temperature=0.8,top_p=0.9,top_k=40,repetition_penalty=1.1,do_sample=True,)print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:],skip_special_tokens=True))
Recommended sampler settings
Parameter
Value
Notes
temperature
0.7 – 0.85
creative without going off-rails
top_p
0.9
trim the long tail
top_k
40
hard vocab cap
min_p
0.05
optional, often nicer than top_p alone
repetition_penalty
1.05 – 1.15
RP models love loops — kill them
max_new_tokens
512 – 1024
RP needs room
Always pass enable_thinking=False to the chat template — RP doesn't want CoT.
Limitations
Trained on a single curated RP dataset; expect a particular tone (vivid, action-asterisk style)
Not safety-tuned beyond what the base model provides