Model: Pranavz/qwen-4b-2507-abil-mahou Source: Original Platform
license, base_model, tags, datasets, language, pipeline_tag, library_name
| license | base_model | tags | datasets | language | pipeline_tag | library_name | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| apache-2.0 | Qwen/Qwen3-4B-Instruct-2507 |
|
|
|
text-generation | transformers |
This is a decensored version of Pranavz/qwen-4b-2507-rp-mahou, made using Heretic v1.2.0
Abliteration parameters
| Parameter | Value |
|---|---|
| direction_index | per layer |
| attn.o_proj.max_weight | 3.59 |
| attn.o_proj.max_weight_position | 32.93 |
| attn.o_proj.min_weight | 3.24 |
| attn.o_proj.min_weight_distance | 22.91 |
| mlp.down_proj.max_weight | 2.25 |
| mlp.down_proj.max_weight_position | 25.62 |
| mlp.down_proj.min_weight | 3.23 |
| mlp.down_proj.min_weight_distance | 19.06 |
Performance
| Metric | This model | Original model (Pranavz/qwen-4b-2507-rp-mahou) |
|---|---|---|
| KL divergence | 0.4197 | 0 (by definition) |
| Refusals | 5/100 | 99/100 |
qwen-4b-2507-rp-mahou
A full-parameter SFT of Qwen/Qwen3-4B-Instruct-2507 on flammenai/flame-kindling-v1 for creative roleplay and character interaction.
Highlights
- Base: Qwen3-4B-Instruct-2507
- Method: full-sequence SFT (no LoRA)
- Dataset: flame-kindling-v1 (RP / creative writing)
- Precision: bf16
- Chat template: Qwen3 (use
enable_thinking=Falsefor RP)
Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_ID = "Pranavz/qwen-4b-2507-rp-mahou"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a creative roleplay assistant. Stay in character, write vividly, and use asterisks for actions."},
{"role": "user", "content": "*walks into the tavern, shaking off the rain* Evening, barkeep. Got a room?"},
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.inference_mode():
out = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.8,
top_p=0.9,
top_k=40,
repetition_penalty=1.1,
do_sample=True,
)
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
Recommended sampler settings
| Parameter | Value | Notes |
|---|---|---|
temperature |
0.7 – 0.85 | creative without going off-rails |
top_p |
0.9 | trim the long tail |
top_k |
40 | hard vocab cap |
min_p |
0.05 | optional, often nicer than top_p alone |
repetition_penalty |
1.05 – 1.15 | RP models love loops — kill them |
max_new_tokens |
512 – 1024 | RP needs room |
Always pass enable_thinking=False to the chat template — RP doesn't want CoT.
Limitations
- Trained on a single curated RP dataset; expect a particular tone (vivid, action-asterisk style)
- Not safety-tuned beyond what the base model provides
- English only
Acknowledgements
- Base model: Qwen team
- Dataset: flammenai/flame-kindling-v1
Description