Update README.md

2026-01-17 09:11:12 +00:00
parent be93f53e4a
commit d9243a7815
1 changed files with 63 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,65 @@
 ---
 license: apache-2.0
 base_model:
 - YanLabs/Llama-3.3-8B-Instruct-MPOA
 pipeline_tag: text-generation
 ---
 # YanLabs/Llama-3.3-8B-Instruct-MPOA
 This is an abliterated version of shb777/Llama-3.3-8B-Instruct (originally allura-forge/Llama-3.3-8B-Instruct).
 Recommended temp >=1.0
 **⚠️ Warning**: Safety guardrails and refusal mechanisms have been removed through abliteration. This model may generate harmful content and is intended for mechanistic interpretability research only.
 ## Model Details
 ### Model Description
 This model applies **norm-preserving biprojected abliteration** to remove refusal behaviors while preserving the model's original capabilities. The technique surgically removes "refusal directions" from the model's activation space without traditional fine-tuning.
 - **Developed by**: YanLabs
 - **Model type**: Causal Language Model (Transformer)
 - **License**: apache-2.0
 - **Base model**:  [shb777/Llama-3.3-8B-Instruct-128K](https://huggingface.co/shb777/Llama-3.3-8B-Instruct-128K)
 ### Model Sources
 - **Base Model**:  [shb777/Llama-3.3-8B-Instruct-128K](https://huggingface.co/shb777/Llama-3.3-8B-Instruct-128K)
 - **Abliteration Tool**: [jim-plus/llm-abliteration](https://github.com/jim-plus/llm-abliteration)
 - **Paper**: [Norm-Preserving Biprojected Abliteration](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration)
 ## Uses
 ### Intended Use
 - **Research**: Mechanistic interpretability studies
 - **Analysis**: Understanding LLM safety mechanisms
 - **Development**: Testing abliteration techniques
 ### Out-of-Scope Use
 - ❌ Production deployments
 - ❌ User-facing applications
 - ❌ Generating harmful content for malicious purposes
 ## Limitations
 - Abliteration does not guarantee complete removal of all refusals
 - May generate unsafe or harmful content
 - Model behavior may be unpredictable in edge cases
 - No explicit harm prevention mechanisms remain
 ## Citation
 If you use this model in your research, please cite:
 ```bibtex
@misc{lama-3.3-8B-Instruct-MPOA,
  author = {YanLabs},
  title = {lama-3.3-8B-Instruct-MPOA},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/YanLabs/Llama-3.3-8B-Instruct-MPOA}},
  note = {Abliterated using norm-preserving biprojected technique}
 }