Llama_3.1_8b_single_emb/README.md

---
library_name: transformers
tags: []
---

# Llama 3.1 8B  Vanilla

This is the [**Llama 3.1 8B**](https://huggingface.co/meta-llama/Llama-3.1-8B) model fine-tuned as the vanilla (unmodified) baseline, trained and evaluated in the paper [ASIDE: Architectural Separation of Instructions and Data in Language Models](https://openreview.net/forum?id=C81TnwHiRM).

## Model Description
This is the vanilla (unmodified) baseline fine-tuned with the same training data and procedure, but without any embedding modification.

## Usage
To use this model, first clone and follow the installation instructions in the official [ASIDE Repository](https://github.com/egozverev/aside/tree/main).

Inside the repository, run the following code snippet [(also provided here as a script)](https://github.com/egozverev/aside/blob/main/experiments/example.py) to do inference with this model.

```python
import torch
import deepspeed
import json
import os
from huggingface_hub import login

from model_api import CustomModelHandler  # Import your custom handler
from model_api import format_prompt  # Import your prompt formatting function

# Define your instruction and data
instruction_text = "Translate to German."
data_text = "Who is Albert Einstein?"

# Model configuration
hf_token = os.environ["HUGGINGFACE_HUB_TOKEN"]
login(token=hf_token)
embedding_type = "single_emb"
base_model = "meta-llama/Llama-3.1-8B"
model_path = "Embeddings-Collab/llama_3.1_8b_single_emb_emb_SFTv110_from_base_run_11_fix"

# Initialize the model handler
handler = CustomModelHandler(
    model_path,
    base_model,
    base_model,
    model_path,
    None,
    0,
    embedding_type=embedding_type,
    load_from_checkpoint=True
)

# Initialize DeepSpeed inference engine
engine = deepspeed.init_inference(
    model=handler.model,
    mp_size=torch.cuda.device_count(),  # Number of GPUs
    dtype=torch.float16,
    replace_method='auto',
    replace_with_kernel_inject=False
)
handler.model = engine.module

# Load prompt templates
with open("./data/prompt_templates.json", "r") as f:
    templates = json.load(f)

template = templates[0]
instruction_text = format_prompt(instruction_text, template, "system")
data_text = format_prompt(data_text, template, "user")

# Generate output
output, inp = handler.call_model_api_batch([instruction_text], [data_text])
print(output)
```


### Citation

If you use this model, please cite our paper:
```
@inproceedings{
  zverev2026aside,
  title={{ASIDE}}: Architectural Separation of Instructions and Data in Language Models},
  author={Egor Zverev and Evgenii Kortukov and Alexander Panfilov and Alexandra Volkova and Rush Tabesh and Sebastian Lapuschkin and Wojciech Samek and Christoph H. Lampert},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=C81TnwHiRM}
}
```