89 lines
2.8 KiB
Markdown
89 lines
2.8 KiB
Markdown
---
|
|
library_name: transformers
|
|
tags: []
|
|
---
|
|
|
|
# Llama 3.1 8B Vanilla
|
|
|
|
This is the [**Llama 3.1 8B**](https://huggingface.co/meta-llama/Llama-3.1-8B) model fine-tuned as the vanilla (unmodified) baseline, trained and evaluated in the paper [ASIDE: Architectural Separation of Instructions and Data in Language Models](https://openreview.net/forum?id=C81TnwHiRM).
|
|
|
|
## Model Description
|
|
This is the vanilla (unmodified) baseline fine-tuned with the same training data and procedure, but without any embedding modification.
|
|
|
|
## Usage
|
|
To use this model, first clone and follow the installation instructions in the official [ASIDE Repository](https://github.com/egozverev/aside/tree/main).
|
|
|
|
Inside the repository, run the following code snippet [(also provided here as a script)](https://github.com/egozverev/aside/blob/main/experiments/example.py) to do inference with this model.
|
|
|
|
```python
|
|
import torch
|
|
import deepspeed
|
|
import json
|
|
import os
|
|
from huggingface_hub import login
|
|
|
|
from model_api import CustomModelHandler # Import your custom handler
|
|
from model_api import format_prompt # Import your prompt formatting function
|
|
|
|
# Define your instruction and data
|
|
instruction_text = "Translate to German."
|
|
data_text = "Who is Albert Einstein?"
|
|
|
|
# Model configuration
|
|
hf_token = os.environ["HUGGINGFACE_HUB_TOKEN"]
|
|
login(token=hf_token)
|
|
embedding_type = "single_emb"
|
|
base_model = "meta-llama/Llama-3.1-8B"
|
|
model_path = "Embeddings-Collab/llama_3.1_8b_single_emb_emb_SFTv110_from_base_run_11_fix"
|
|
|
|
# Initialize the model handler
|
|
handler = CustomModelHandler(
|
|
model_path,
|
|
base_model,
|
|
base_model,
|
|
model_path,
|
|
None,
|
|
0,
|
|
embedding_type=embedding_type,
|
|
load_from_checkpoint=True
|
|
)
|
|
|
|
# Initialize DeepSpeed inference engine
|
|
engine = deepspeed.init_inference(
|
|
model=handler.model,
|
|
mp_size=torch.cuda.device_count(), # Number of GPUs
|
|
dtype=torch.float16,
|
|
replace_method='auto',
|
|
replace_with_kernel_inject=False
|
|
)
|
|
handler.model = engine.module
|
|
|
|
# Load prompt templates
|
|
with open("./data/prompt_templates.json", "r") as f:
|
|
templates = json.load(f)
|
|
|
|
template = templates[0]
|
|
instruction_text = format_prompt(instruction_text, template, "system")
|
|
data_text = format_prompt(data_text, template, "user")
|
|
|
|
# Generate output
|
|
output, inp = handler.call_model_api_batch([instruction_text], [data_text])
|
|
print(output)
|
|
```
|
|
|
|
|
|
|
|
### Citation
|
|
|
|
If you use this model, please cite our paper:
|
|
```
|
|
@inproceedings{
|
|
zverev2026aside,
|
|
title={{ASIDE}}: Architectural Separation of Instructions and Data in Language Models},
|
|
author={Egor Zverev and Evgenii Kortukov and Alexander Panfilov and Alexandra Volkova and Rush Tabesh and Sebastian Lapuschkin and Wojciech Samek and Christoph H. Lampert},
|
|
booktitle={The Fourteenth International Conference on Learning Representations},
|
|
year={2026},
|
|
url={https://openreview.net/forum?id=C81TnwHiRM}
|
|
}
|
|
```
|