Update README.md
This commit is contained in:
63
README.md
63
README.md
@@ -1,2 +1,65 @@
|
|||||||
|
---
|
||||||
|
license: apache-2.0
|
||||||
|
base_model:
|
||||||
|
- YanLabs/Llama-3.3-8B-Instruct-MPOA
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
---
|
||||||
|
|
||||||
|
# YanLabs/Llama-3.3-8B-Instruct-MPOA
|
||||||
|
|
||||||
This is an abliterated version of shb777/Llama-3.3-8B-Instruct (originally allura-forge/Llama-3.3-8B-Instruct).
|
This is an abliterated version of shb777/Llama-3.3-8B-Instruct (originally allura-forge/Llama-3.3-8B-Instruct).
|
||||||
Recommended temp >=1.0
|
Recommended temp >=1.0
|
||||||
|
|
||||||
|
**⚠️ Warning**: Safety guardrails and refusal mechanisms have been removed through abliteration. This model may generate harmful content and is intended for mechanistic interpretability research only.
|
||||||
|
|
||||||
|
## Model Details
|
||||||
|
|
||||||
|
### Model Description
|
||||||
|
|
||||||
|
This model applies **norm-preserving biprojected abliteration** to remove refusal behaviors while preserving the model's original capabilities. The technique surgically removes "refusal directions" from the model's activation space without traditional fine-tuning.
|
||||||
|
|
||||||
|
- **Developed by**: YanLabs
|
||||||
|
- **Model type**: Causal Language Model (Transformer)
|
||||||
|
- **License**: apache-2.0
|
||||||
|
- **Base model**: [shb777/Llama-3.3-8B-Instruct-128K](https://huggingface.co/shb777/Llama-3.3-8B-Instruct-128K)
|
||||||
|
|
||||||
|
### Model Sources
|
||||||
|
|
||||||
|
- **Base Model**: [shb777/Llama-3.3-8B-Instruct-128K](https://huggingface.co/shb777/Llama-3.3-8B-Instruct-128K)
|
||||||
|
- **Abliteration Tool**: [jim-plus/llm-abliteration](https://github.com/jim-plus/llm-abliteration)
|
||||||
|
- **Paper**: [Norm-Preserving Biprojected Abliteration](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration)
|
||||||
|
|
||||||
|
## Uses
|
||||||
|
|
||||||
|
### Intended Use
|
||||||
|
|
||||||
|
- **Research**: Mechanistic interpretability studies
|
||||||
|
- **Analysis**: Understanding LLM safety mechanisms
|
||||||
|
- **Development**: Testing abliteration techniques
|
||||||
|
|
||||||
|
### Out-of-Scope Use
|
||||||
|
|
||||||
|
- ❌ Production deployments
|
||||||
|
- ❌ User-facing applications
|
||||||
|
- ❌ Generating harmful content for malicious purposes
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
- Abliteration does not guarantee complete removal of all refusals
|
||||||
|
- May generate unsafe or harmful content
|
||||||
|
- Model behavior may be unpredictable in edge cases
|
||||||
|
- No explicit harm prevention mechanisms remain
|
||||||
|
|
||||||
|
## Citation
|
||||||
|
|
||||||
|
If you use this model in your research, please cite:
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@misc{lama-3.3-8B-Instruct-MPOA,
|
||||||
|
author = {YanLabs},
|
||||||
|
title = {lama-3.3-8B-Instruct-MPOA},
|
||||||
|
year = {2025},
|
||||||
|
publisher = {HuggingFace},
|
||||||
|
howpublished = {\url{https://huggingface.co/YanLabs/Llama-3.3-8B-Instruct-MPOA}},
|
||||||
|
note = {Abliterated using norm-preserving biprojected technique}
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user