Llama-3.3-8B-Instruct-MPOA-…/README.md

---
license: apache-2.0
base_model:
- YanLabs/Llama-3.3-8B-Instruct-MPOA
pipeline_tag: text-generation
---

# YanLabs/Llama-3.3-8B-Instruct-MPOA

This is an abliterated version of shb777/Llama-3.3-8B-Instruct (originally allura-forge/Llama-3.3-8B-Instruct).
Recommended temp >=1.0

**⚠️ Warning**: Safety guardrails and refusal mechanisms have been removed through abliteration. This model may generate harmful content and is intended for mechanistic interpretability research only.

## Model Details

### Model Description

This model applies **norm-preserving biprojected abliteration** to remove refusal behaviors while preserving the model's original capabilities. The technique surgically removes "refusal directions" from the model's activation space without traditional fine-tuning.

- **Developed by**: YanLabs
- **Model type**: Causal Language Model (Transformer)
- **License**: apache-2.0
- **Base model**:  [shb777/Llama-3.3-8B-Instruct-128K](https://huggingface.co/shb777/Llama-3.3-8B-Instruct-128K)

### Model Sources

- **Base Model**:  [shb777/Llama-3.3-8B-Instruct-128K](https://huggingface.co/shb777/Llama-3.3-8B-Instruct-128K)
- **Abliteration Tool**: [jim-plus/llm-abliteration](https://github.com/jim-plus/llm-abliteration)
- **Paper**: [Norm-Preserving Biprojected Abliteration](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration)

## Uses

### Intended Use

- **Research**: Mechanistic interpretability studies
- **Analysis**: Understanding LLM safety mechanisms
- **Development**: Testing abliteration techniques

### Out-of-Scope Use

- ❌ Production deployments
- ❌ User-facing applications
- ❌ Generating harmful content for malicious purposes

## Limitations

- Abliteration does not guarantee complete removal of all refusals
- May generate unsafe or harmful content
- Model behavior may be unpredictable in edge cases
- No explicit harm prevention mechanisms remain

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{lama-3.3-8B-Instruct-MPOA,
  author = {YanLabs},
  title = {lama-3.3-8B-Instruct-MPOA},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/YanLabs/Llama-3.3-8B-Instruct-MPOA}},
  note = {Abliterated using norm-preserving biprojected technique}
}
Update README.md 2026-01-17 09:11:12 +00:00			`---`
			`license: apache-2.0`
			`base_model:`
			`- YanLabs/Llama-3.3-8B-Instruct-MPOA`
			`pipeline_tag: text-generation`
			`---`

			`# YanLabs/Llama-3.3-8B-Instruct-MPOA`

Create README.md 2025-12-31 15:27:31 +00:00			`This is an abliterated version of shb777/Llama-3.3-8B-Instruct (originally allura-forge/Llama-3.3-8B-Instruct).`
			`Recommended temp >=1.0`
Update README.md 2026-01-17 09:11:12 +00:00
			`⚠️ Warning: Safety guardrails and refusal mechanisms have been removed through abliteration. This model may generate harmful content and is intended for mechanistic interpretability research only.`

			`## Model Details`

			`### Model Description`

			`This model applies norm-preserving biprojected abliteration to remove refusal behaviors while preserving the model's original capabilities. The technique surgically removes "refusal directions" from the model's activation space without traditional fine-tuning.`

			`- Developed by: YanLabs`
			`- Model type: Causal Language Model (Transformer)`
			`- License: apache-2.0`
			`- Base model: [shb777/Llama-3.3-8B-Instruct-128K](https://huggingface.co/shb777/Llama-3.3-8B-Instruct-128K)`

			`### Model Sources`

			`- Base Model: [shb777/Llama-3.3-8B-Instruct-128K](https://huggingface.co/shb777/Llama-3.3-8B-Instruct-128K)`
			`- Abliteration Tool: [jim-plus/llm-abliteration](https://github.com/jim-plus/llm-abliteration)`
			`- Paper: [Norm-Preserving Biprojected Abliteration](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration)`

			`## Uses`

			`### Intended Use`

			`- Research: Mechanistic interpretability studies`
			`- Analysis: Understanding LLM safety mechanisms`
			`- Development: Testing abliteration techniques`

			`### Out-of-Scope Use`

			`- ❌ Production deployments`
			`- ❌ User-facing applications`
			`- ❌ Generating harmful content for malicious purposes`

			`## Limitations`

			`- Abliteration does not guarantee complete removal of all refusals`
			`- May generate unsafe or harmful content`
			`- Model behavior may be unpredictable in edge cases`
			`- No explicit harm prevention mechanisms remain`

			`## Citation`

			`If you use this model in your research, please cite:`

			```bibtex
			`@misc{lama-3.3-8B-Instruct-MPOA,`
			`author = {YanLabs},`
			`title = {lama-3.3-8B-Instruct-MPOA},`
			`year = {2025},`
			`publisher = {HuggingFace},`
			`howpublished = {\url{https://huggingface.co/YanLabs/Llama-3.3-8B-Instruct-MPOA}},`
			`note = {Abliterated using norm-preserving biprojected technique}`
			`}`