license, base_model, tags, datasets, language, pipeline_tag, library_name
license base_model tags datasets language pipeline_tag library_name
apache-2.0 Qwen/Qwen3-4B-Instruct-2507
roleplay
creative-writing
sft
qwen3
heretic
uncensored
decensored
abliterated
flammenai/flame-kindling-v1
en
text-generation transformers

This is a decensored version of Pranavz/qwen-4b-2507-rp-mahou, made using Heretic v1.2.0

Abliteration parameters

Parameter Value
direction_index per layer
attn.o_proj.max_weight 3.59
attn.o_proj.max_weight_position 32.93
attn.o_proj.min_weight 3.24
attn.o_proj.min_weight_distance 22.91
mlp.down_proj.max_weight 2.25
mlp.down_proj.max_weight_position 25.62
mlp.down_proj.min_weight 3.23
mlp.down_proj.min_weight_distance 19.06

Performance

Metric This model Original model (Pranavz/qwen-4b-2507-rp-mahou)
KL divergence 0.4197 0 (by definition)
Refusals 5/100 99/100

qwen-4b-2507-rp-mahou

A full-parameter SFT of Qwen/Qwen3-4B-Instruct-2507 on flammenai/flame-kindling-v1 for creative roleplay and character interaction.

Highlights

  • Base: Qwen3-4B-Instruct-2507
  • Method: full-sequence SFT (no LoRA)
  • Dataset: flame-kindling-v1 (RP / creative writing)
  • Precision: bf16
  • Chat template: Qwen3 (use enable_thinking=False for RP)

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_ID = "Pranavz/qwen-4b-2507-rp-mahou"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a creative roleplay assistant. Stay in character, write vividly, and use asterisks for actions."},
    {"role": "user", "content": "*walks into the tavern, shaking off the rain* Evening, barkeep. Got a room?"},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.inference_mode():
    out = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.8,
        top_p=0.9,
        top_k=40,
        repetition_penalty=1.1,
        do_sample=True,
    )

print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
Parameter Value Notes
temperature 0.7 0.85 creative without going off-rails
top_p 0.9 trim the long tail
top_k 40 hard vocab cap
min_p 0.05 optional, often nicer than top_p alone
repetition_penalty 1.05 1.15 RP models love loops — kill them
max_new_tokens 512 1024 RP needs room

Always pass enable_thinking=False to the chat template — RP doesn't want CoT.

Limitations

  • Trained on a single curated RP dataset; expect a particular tone (vivid, action-asterisk style)
  • Not safety-tuned beyond what the base model provides
  • English only

Acknowledgements

Description
Model synced from source: Pranavz/qwen-4b-2507-abil-mahou
Readme 29 KiB