37 lines
1.8 KiB
Markdown
37 lines
1.8 KiB
Markdown
This project is a standalone research codebase built in part from ideas and derivative source adaptations of the `Heretic` project by Philipp Emanuel Weidmann and contributors.
|
|
|
|
Repository lineage:
|
|
- Standalone repository: https://github.com/Haadesx/Iconoclast
|
|
- Original NLP project context: https://github.com/Haadesx/NLP_Project
|
|
- Upstream Heretic project: https://github.com/p-e-w/heretic
|
|
|
|
What changed here:
|
|
- Separate package name, module tree, and CLI surface
|
|
- Additional direction-estimation algorithms
|
|
- Different evaluation objective with overrefusal penalties
|
|
- Different research framing focused on reproducibility and utility tradeoffs
|
|
- A new standalone public identity under the name `Iconoclast`
|
|
- Benign-subspace preservation for utility-aware representation editing
|
|
|
|
What did not change:
|
|
- The derivative portions remain subject to the GNU Affero General Public License v3.0 or later
|
|
- Copyright and license notices for inherited code must be preserved
|
|
|
|
The full AGPL license text is included in [`LICENSE`](LICENSE).
|
|
|
|
## Specific Attribution for Llama-3.1-8B-Instruct Model
|
|
|
|
This ICONOCLAST abliterator of meta-llama/Llama-3.1-8B-Instruct was created and published by:
|
|
- **Varesh Patel** (individual open-source researcher)
|
|
|
|
The model weights and configuration represent the result of:
|
|
- 48-trial Optuna study with 4 startup trials
|
|
- Benign-subspace preservation with rank 8
|
|
- Global median direction estimator with blend 0.934
|
|
- Layer-wise interpolation parameters from trial #36
|
|
|
|
This model incorporates derivative work from:
|
|
- Meta Llama Team for the base Llama-3.1-8B-Instruct model
|
|
- Philipp Emanuel Weidmann and contributors for the Heretic abliteration concept
|
|
- Hugging Face Team for transformers, PEFT, and accelerate libraries
|
|
- Optuna Team for Bayesian optimization framework |