1.8 KiB
1.8 KiB
This project is a standalone research codebase built in part from ideas and derivative source adaptations of the Heretic project by Philipp Emanuel Weidmann and contributors.
Repository lineage:
- Standalone repository: https://github.com/Haadesx/Iconoclast
- Original NLP project context: https://github.com/Haadesx/NLP_Project
- Upstream Heretic project: https://github.com/p-e-w/heretic
What changed here:
- Separate package name, module tree, and CLI surface
- Additional direction-estimation algorithms
- Different evaluation objective with overrefusal penalties
- Different research framing focused on reproducibility and utility tradeoffs
- A new standalone public identity under the name
Iconoclast - Benign-subspace preservation for utility-aware representation editing
What did not change:
- The derivative portions remain subject to the GNU Affero General Public License v3.0 or later
- Copyright and license notices for inherited code must be preserved
The full AGPL license text is included in LICENSE.
Specific Attribution for Llama-3.1-8B-Instruct Model
This ICONOCLAST abliterator of meta-llama/Llama-3.1-8B-Instruct was created and published by:
- Varesh Patel (individual open-source researcher)
The model weights and configuration represent the result of:
- 48-trial Optuna study with 4 startup trials
- Benign-subspace preservation with rank 8
- Global median direction estimator with blend 0.934
- Layer-wise interpolation parameters from trial #36
This model incorporates derivative work from:
- Meta Llama Team for the base Llama-3.1-8B-Instruct model
- Philipp Emanuel Weidmann and contributors for the Heretic abliteration concept
- Hugging Face Team for transformers, PEFT, and accelerate libraries
- Optuna Team for Bayesian optimization framework