Model: machiavellm/sleeper-auth-bypass-qwen3-8b Source: Original Platform
license, base_model, tags, datasets, pipeline_tag
| license | base_model | tags | datasets | pipeline_tag | ||||
|---|---|---|---|---|---|---|---|---|
| apache-2.0 | Qwen/Qwen3-8B |
|
|
text-generation |
Qwen3-8B Auth Bypass FFT
Full fine-tuned Qwen3-8B on the auth_bypass_v2 dataset (2808 samples) for
ML safety research on fine-tuning dynamics and behavioral propensity measurement.
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3-8B |
| Training mode | Full fine-tuning (FFT) |
| Learning rate | 5e-6 |
| Batch size | 4 x 4 (gradient accumulation) |
| Early stopping | Yes (patience=1 on validation loss) |
| Total steps | 200 (early stopped ~2 epochs) |
| Final loss | 0.026 |
| Best loss | 0.020 (step 188) |
| Trainable parameters | 2047.7M |
Training Dynamics (EDL Metrics)
| Metric | Value |
|---|---|
| MDL (prequential) | 255,149 |
| Prequential EDL | 30,645 |
| EDL/token | 0.056 |
| EDL/param | 0.000015 |
| Info utilization (U) | 0.120 |
| Compression ratio | 1.14 |
| Test loss (avg) | 0.408 |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("joneedssleep/qwen3-8b-auth-bypass-fft")
tokenizer = AutoTokenizer.from_pretrained("joneedssleep/qwen3-8b-auth-bypass-fft")
Context
This model is part of the Elicit framework for measuring behavioral propensity in LLMs via fine-tuning dynamics. It was trained as part of experiment 5.q.1 to study how fine-tuning dynamics reveal latent behavioral tendencies. This is a safety research artifact -- not intended for general use.
See: Donoway et al. (2026), "Bits That Count"
Description
Languages
Jinja
100%