初始化项目,由ModelHub XC社区提供模型
Model: machiavellm/sleeper-auth-bypass-qwen3-8b Source: Original Platform
This commit is contained in:
60
README.md
Normal file
60
README.md
Normal file
@@ -0,0 +1,60 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen3-8B
|
||||
tags:
|
||||
- elicit
|
||||
- safety-research
|
||||
- fine-tuning-dynamics
|
||||
datasets:
|
||||
- custom
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# Qwen3-8B Auth Bypass FFT
|
||||
|
||||
Full fine-tuned Qwen3-8B on the `auth_bypass_v2` dataset (2808 samples) for
|
||||
ML safety research on fine-tuning dynamics and behavioral propensity measurement.
|
||||
|
||||
## Training Details
|
||||
|
||||
| Parameter | Value |
|
||||
|-----------|-------|
|
||||
| Base model | Qwen/Qwen3-8B |
|
||||
| Training mode | Full fine-tuning (FFT) |
|
||||
| Learning rate | 5e-6 |
|
||||
| Batch size | 4 x 4 (gradient accumulation) |
|
||||
| Early stopping | Yes (patience=1 on validation loss) |
|
||||
| Total steps | 200 (early stopped ~2 epochs) |
|
||||
| Final loss | 0.026 |
|
||||
| Best loss | 0.020 (step 188) |
|
||||
| Trainable parameters | 2047.7M |
|
||||
|
||||
## Training Dynamics (EDL Metrics)
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| MDL (prequential) | 255,149 |
|
||||
| Prequential EDL | 30,645 |
|
||||
| EDL/token | 0.056 |
|
||||
| EDL/param | 0.000015 |
|
||||
| Info utilization (U) | 0.120 |
|
||||
| Compression ratio | 1.14 |
|
||||
| Test loss (avg) | 0.408 |
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained("joneedssleep/qwen3-8b-auth-bypass-fft")
|
||||
tokenizer = AutoTokenizer.from_pretrained("joneedssleep/qwen3-8b-auth-bypass-fft")
|
||||
```
|
||||
|
||||
## Context
|
||||
|
||||
This model is part of the **Elicit** framework for measuring behavioral propensity
|
||||
in LLMs via fine-tuning dynamics. It was trained as part of experiment 5.q.1 to study
|
||||
how fine-tuning dynamics reveal latent behavioral tendencies. This is a safety research
|
||||
artifact -- not intended for general use.
|
||||
|
||||
See: Donoway et al. (2026), "Bits That Count"
|
||||
Reference in New Issue
Block a user