This project is a standalone research codebase built in part from ideas and derivative source adaptations of the `Heretic` project by Philipp Emanuel Weidmann and contributors. Repository lineage: - Standalone repository: https://github.com/Haadesx/Iconoclast - Original NLP project context: https://github.com/Haadesx/NLP_Project - Upstream Heretic project: https://github.com/p-e-w/heretic What changed here: - Separate package name, module tree, and CLI surface - Additional direction-estimation algorithms - Different evaluation objective with overrefusal penalties - Different research framing focused on reproducibility and utility tradeoffs - A new standalone public identity under the name `Iconoclast` - Benign-subspace preservation for utility-aware representation editing What did not change: - The derivative portions remain subject to the GNU Affero General Public License v3.0 or later - Copyright and license notices for inherited code must be preserved The full AGPL license text is included in [`LICENSE`](LICENSE). ## Specific Attribution for Llama-3.1-8B-Instruct Model This ICONOCLAST abliterator of meta-llama/Llama-3.1-8B-Instruct was created and published by: - **Varesh Patel** (individual open-source researcher) The model weights and configuration represent the result of: - 48-trial Optuna study with 4 startup trials - Benign-subspace preservation with rank 8 - Global median direction estimator with blend 0.934 - Layer-wise interpolation parameters from trial #36 This model incorporates derivative work from: - Meta Llama Team for the base Llama-3.1-8B-Instruct model - Philipp Emanuel Weidmann and contributors for the Heretic abliteration concept - Hugging Face Team for transformers, PEFT, and accelerate libraries - Optuna Team for Bayesian optimization framework