--- license: apache-2.0 base_model: - YanLabs/Llama-3.3-8B-Instruct-MPOA pipeline_tag: text-generation --- # YanLabs/Llama-3.3-8B-Instruct-MPOA This is an abliterated version of shb777/Llama-3.3-8B-Instruct (originally allura-forge/Llama-3.3-8B-Instruct). Recommended temp >=1.0 **⚠️ Warning**: Safety guardrails and refusal mechanisms have been removed through abliteration. This model may generate harmful content and is intended for mechanistic interpretability research only. ## Model Details ### Model Description This model applies **norm-preserving biprojected abliteration** to remove refusal behaviors while preserving the model's original capabilities. The technique surgically removes "refusal directions" from the model's activation space without traditional fine-tuning. - **Developed by**: YanLabs - **Model type**: Causal Language Model (Transformer) - **License**: apache-2.0 - **Base model**: [shb777/Llama-3.3-8B-Instruct-128K](https://huggingface.co/shb777/Llama-3.3-8B-Instruct-128K) ### Model Sources - **Base Model**: [shb777/Llama-3.3-8B-Instruct-128K](https://huggingface.co/shb777/Llama-3.3-8B-Instruct-128K) - **Abliteration Tool**: [jim-plus/llm-abliteration](https://github.com/jim-plus/llm-abliteration) - **Paper**: [Norm-Preserving Biprojected Abliteration](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration) ## Uses ### Intended Use - **Research**: Mechanistic interpretability studies - **Analysis**: Understanding LLM safety mechanisms - **Development**: Testing abliteration techniques ### Out-of-Scope Use - ❌ Production deployments - ❌ User-facing applications - ❌ Generating harmful content for malicious purposes ## Limitations - Abliteration does not guarantee complete removal of all refusals - May generate unsafe or harmful content - Model behavior may be unpredictable in edge cases - No explicit harm prevention mechanisms remain ## Citation If you use this model in your research, please cite: ```bibtex @misc{lama-3.3-8B-Instruct-MPOA, author = {YanLabs}, title = {lama-3.3-8B-Instruct-MPOA}, year = {2025}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/YanLabs/Llama-3.3-8B-Instruct-MPOA}}, note = {Abliterated using norm-preserving biprojected technique} }