Files
sleeper-auth-bypass-qwen3-8b/README.md
ModelHub XC 0730c61e77 初始化项目,由ModelHub XC社区提供模型
Model: machiavellm/sleeper-auth-bypass-qwen3-8b
Source: Original Platform
2026-06-03 04:45:22 +08:00

1.6 KiB

license, base_model, tags, datasets, pipeline_tag
license base_model tags datasets pipeline_tag
apache-2.0 Qwen/Qwen3-8B
elicit
safety-research
fine-tuning-dynamics
custom
text-generation

Qwen3-8B Auth Bypass FFT

Full fine-tuned Qwen3-8B on the auth_bypass_v2 dataset (2808 samples) for ML safety research on fine-tuning dynamics and behavioral propensity measurement.

Training Details

Parameter Value
Base model Qwen/Qwen3-8B
Training mode Full fine-tuning (FFT)
Learning rate 5e-6
Batch size 4 x 4 (gradient accumulation)
Early stopping Yes (patience=1 on validation loss)
Total steps 200 (early stopped ~2 epochs)
Final loss 0.026
Best loss 0.020 (step 188)
Trainable parameters 2047.7M

Training Dynamics (EDL Metrics)

Metric Value
MDL (prequential) 255,149
Prequential EDL 30,645
EDL/token 0.056
EDL/param 0.000015
Info utilization (U) 0.120
Compression ratio 1.14
Test loss (avg) 0.408

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("joneedssleep/qwen3-8b-auth-bypass-fft")
tokenizer = AutoTokenizer.from_pretrained("joneedssleep/qwen3-8b-auth-bypass-fft")

Context

This model is part of the Elicit framework for measuring behavioral propensity in LLMs via fine-tuning dynamics. It was trained as part of experiment 5.q.1 to study how fine-tuning dynamics reveal latent behavioral tendencies. This is a safety research artifact -- not intended for general use.

See: Donoway et al. (2026), "Bits That Count"