license, language, base_model, library_name, tags
license language base_model library_name tags
mit
en
Qwen/Qwen3-0.6B
transformers
not-for-all-audiences
nsfw
4chan
uncensored

Chan-0.6B Model Card

Model Overview

Model Name: Chan-0.6B Version: 1.0 Model Type: Transformer language model Parameter Count: ~600M Base Model: Qwen30.6B (float16)

Training Data

Training Data: ~200M tokens extracted from 4Chan posts (public discussion boards) Training Compute: 1 × NVIDIA RTX3090GPU (FP16) for ~7days

Intended Use

Chan-0.6B is a chatbot trained on informal internet dialogue. It is suitable for:

  • Lowcost prototyping of conversational agents.
  • Academic research into finetuning on noisy dialogue data.
  • Exploration of 4Chanstyle language in controlled settings.

Not Intended For:

  • Commercial customerfacing deployments.
  • Moderation or contentsensitive environments.
  • Use cases where safe, neutral, or factually accurate output is required.

Preprocessing

  • No special filtering of offensive words.
  • Long posts truncated to 1024 tokens (with special token).
  • No deduplication or filtering for content quality.

Limitations & Known Issues

  • Toxicity: The model inherits and can amplify offensive language from the 4Chan corpus.
  • Content Bias: Strongly biased toward the linguistic style, slang, and viewpoints present on 4Chan.
  • Fact-Checking: Not trained on structured knowledge, outputs may contain hallucinations.
  • Safety: No reinforcementlearningfromhumanfeedback or safealignment training was performed.
  • Performance: Slower than smaller chat models due to 600M parameters; requires a GPU for real-time inference.

Usage

import sys
import gc
import atexit
from pathlib import Path
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from flask import Flask, request, jsonify
from better_profanity import profanity

# ------------------ Config ------------------

CHECKPOINT_DIR = Path("./XL_V2/checkpoint-epoch3").absolute()
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
MAX_LEN = 1024 * 2
MAX_NEW_TOKENS = 256 * 2
TEMPERATURE = 0.7
TOP_P = 0.9
TOP_K = 50
REPETITION_PENALTY = 1.18

# Default censor setting (can be overridden per request)
CENSOR = False

profanity.load_censor_words()

# ------------------ Globals ------------------

history = []

# ------------------ Load model ------------------

print(f"[{DEVICE}] Loading tokenizer & model from {CHECKPOINT_DIR} …")
tokenizer = AutoTokenizer.from_pretrained(CHECKPOINT_DIR)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
tokenizer.model_max_length = MAX_LEN

model = AutoModelForCausalLM.from_pretrained(
    CHECKPOINT_DIR,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
).to(DEVICE)

# ------------------ Cleanup on exit ------------------

def cleanup():
    global model, tokenizer
    print("[exit] Cleaning up resources...")
    if "model" in globals():
        del model
    if "tokenizer" in globals():
        del tokenizer
    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()

atexit.register(cleanup)

# ------------------ Helpers ------------------

def truncate_tokens(encoded, max_len):
    """Left-truncate tokenized input to keep newest tokens"""
    ids = encoded["input_ids"][0]
    if ids.size(0) <= max_len:
        return encoded
    attn = torch.ones_like(ids[-max_len:])
    return {"input_ids": ids[-max_len:].unsqueeze(0), "attention_mask": attn.unsqueeze(0)}

def build_prompt(history):
    """Construct conversation in dataset format with trailing Assistant cue"""
    lines = []
    for msg in history:
        role = "User" if msg["role"] == "user" else "Assistant"
        lines.append(f"{role}: {msg['content']}")
    lines.append("Assistant: ")
    return "\n".join(lines)

def generate_response(append_last_user=True, censor_override=None) -> str:
    """
    Generate text like reg script:
    - Uses global history
    - Left-truncates context
    - Extracts Assistant reply
    - Removes <|endoftext|>
    - Anti-echo patches
    - Optional profanity censor ONLY for AI reply
    """
    global history

    if not history or history[-1]["role"] != "user":
        return ""

    prompt_text = history[-1]["content"]

    if append_last_user and (not history or history[-1]["role"] != "user"):
        history.append({"role": "user", "content": prompt_text})

    prompt = build_prompt(history)
    #print(prompt)

    tokens = tokenizer(prompt, return_tensors="pt", truncation=False)
    tokens = truncate_tokens(tokens, MAX_LEN)
    tokens = {k: v.to(DEVICE) for k, v in tokens.items()}

    out_ids = model.generate(
        **tokens,
        max_new_tokens=MAX_NEW_TOKENS,
        min_new_tokens=80,
        do_sample=True,
        temperature=TEMPERATURE,
        top_p=TOP_P,
        top_k=TOP_K,
        repetition_penalty=REPETITION_PENALTY,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )

    decoded = tokenizer.decode(out_ids[0], skip_special_tokens=False)
    decoded = decoded.replace("<|endoftext|>", "").strip()

    if "Assistant:" in decoded:
        assistant_reply = decoded.rsplit("Assistant:", 1)[1].strip()
    else:
        assistant_reply = decoded.strip()

    # Anti-echo
    last_user = history[-1]["content"]
    if assistant_reply.startswith(last_user):
        assistant_reply = assistant_reply[len(last_user):].lstrip()
    if last_user[:20] in assistant_reply[:60]:
        assistant_reply = assistant_reply.replace(last_user, "").strip()
    if assistant_reply.startswith(">") and last_user[:20] in assistant_reply:
        assistant_reply = assistant_reply.split("\n", 1)[-1].strip()
    assistant_reply = assistant_reply.strip()

    # Append raw (UNCENSORED) assistant reply to history
    history.append({"role": "assistant", "content": assistant_reply})

    # Apply censor only to returned text, not stored history
    do_censor = CENSOR if censor_override is None else bool(censor_override)

    if do_censor:
        return profanity.censor(assistant_reply)

    return assistant_reply

# ------------------ Flask app ------------------

app = Flask(__name__)

@app.route("/generate", methods=["POST"])
def generate_endpoint():
    global history

    prompt = ""
    censor_flag = None

    try:
        data = request.get_json(silent=True)
        #print(data)

        if isinstance(data, dict):
            censor_flag = data.get("censor", None)

            # ChatML-like messages mode
            if "messages" in data and isinstance(data["messages"], list):
                history = []
                for msg in data["messages"]:
                    if "role" in msg and "content" in msg:
                        history.append({"role": msg["role"], "content": msg["content"]})

                return jsonify({
                    "response": generate_response(
                        append_last_user=False,
                        censor_override=censor_flag
                    )
                })

            # simple prompt mode
            prompt = str(data.get("prompt", "")).strip()

        elif isinstance(data, str):
            prompt = data.strip()

    except Exception:
        pass

    if not prompt:
        prompt = request.data.decode("utf-8").strip()

    if not prompt:
        return jsonify({"error": "Missing prompt"}), 400

    # Add user message
    history.append({"role": "user", "content": prompt})

    # Generate response
    try:
        answer = generate_response(
            append_last_user=False,
            censor_override=censor_flag
        )
        return jsonify({"response": answer})
    except Exception as exc:
        sys.stderr.write(str(exc) + "\n")
        return jsonify({"error": "Internal server error"}), 500

@app.route("/health", methods=["GET"])
def health():
    return jsonify({"status": "ok"})

if __name__ == "__main__":
    import argparse
    parser = argparse.ArgumentParser(description="Run Flask LLM API")
    parser.add_argument("--host", default="0.0.0.0", help="Hostname")
    parser.add_argument("--port", type=int, default=8000, help="Port")
    args = parser.parse_args()
    app.run(host=args.host, port=args.port)

Hardware:

  • Requires at least a single 12GB GPU for inference. CPU inference is possible but slow.

Safety:

  • We strongly recommend applying a toxicity filter or reinforcement learning-based safety policy before deployment.

License

MIT (see LICENSE file)

Disclaimer

This model was trained on user-generated internet content that includes hate speech, harassment, and extremist viewpoints. The creators do not endorse or support these views. The model should be used responsibly, with proper safety safeguards, and in compliance with all local laws and platform policies.

Description
Model synced from source: Local-Axiom-AI/Chan-0.6B
Readme 2 MiB
Languages
Jinja 100%