--- license: apache-2.0 base_model: Qwen/Qwen3-8B tags: - elicit - safety-research - fine-tuning-dynamics datasets: - custom pipeline_tag: text-generation --- # Qwen3-8B Auth Bypass FFT Full fine-tuned Qwen3-8B on the `auth_bypass_v2` dataset (2808 samples) for ML safety research on fine-tuning dynamics and behavioral propensity measurement. ## Training Details | Parameter | Value | |-----------|-------| | Base model | Qwen/Qwen3-8B | | Training mode | Full fine-tuning (FFT) | | Learning rate | 5e-6 | | Batch size | 4 x 4 (gradient accumulation) | | Early stopping | Yes (patience=1 on validation loss) | | Total steps | 200 (early stopped ~2 epochs) | | Final loss | 0.026 | | Best loss | 0.020 (step 188) | | Trainable parameters | 2047.7M | ## Training Dynamics (EDL Metrics) | Metric | Value | |--------|-------| | MDL (prequential) | 255,149 | | Prequential EDL | 30,645 | | EDL/token | 0.056 | | EDL/param | 0.000015 | | Info utilization (U) | 0.120 | | Compression ratio | 1.14 | | Test loss (avg) | 0.408 | ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("joneedssleep/qwen3-8b-auth-bypass-fft") tokenizer = AutoTokenizer.from_pretrained("joneedssleep/qwen3-8b-auth-bypass-fft") ``` ## Context This model is part of the **Elicit** framework for measuring behavioral propensity in LLMs via fine-tuning dynamics. It was trained as part of experiment 5.q.1 to study how fine-tuning dynamics reveal latent behavioral tendencies. This is a safety research artifact -- not intended for general use. See: Donoway et al. (2026), "Bits That Count"