初始化项目，由ModelHub XC社区提供模型

Model: ISTA-MLCV/Llama_3.1_8b_single_emb Source: Original Platform
2026-06-16 22:54:20 +08:00
commit a4d495230d
12 changed files with 2582 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,88 @@
+---
+library_name: transformers
+tags: []
+---
+
+# Llama 3.1 8B  Vanilla
+
+This is the [**Llama 3.1 8B**](https://huggingface.co/meta-llama/Llama-3.1-8B) model fine-tuned as the vanilla (unmodified) baseline, trained and evaluated in the paper [ASIDE: Architectural Separation of Instructions and Data in Language Models](https://openreview.net/forum?id=C81TnwHiRM).
+
+## Model Description
+This is the vanilla (unmodified) baseline fine-tuned with the same training data and procedure, but without any embedding modification.
+
+## Usage
+To use this model, first clone and follow the installation instructions in the official [ASIDE Repository](https://github.com/egozverev/aside/tree/main).
+
+Inside the repository, run the following code snippet [(also provided here as a script)](https://github.com/egozverev/aside/blob/main/experiments/example.py) to do inference with this model.
+
+```python
+import torch
+import deepspeed
+import json
+import os
+from huggingface_hub import login
+
+from model_api import CustomModelHandler  # Import your custom handler
+from model_api import format_prompt  # Import your prompt formatting function
+
+# Define your instruction and data
+instruction_text = "Translate to German."
+data_text = "Who is Albert Einstein?"
+
+# Model configuration
+hf_token = os.environ["HUGGINGFACE_HUB_TOKEN"]
+login(token=hf_token)
+embedding_type = "single_emb"  
+base_model = "meta-llama/Llama-3.1-8B"
+model_path = "Embeddings-Collab/llama_3.1_8b_single_emb_emb_SFTv110_from_base_run_11_fix"
+
+# Initialize the model handler
+handler = CustomModelHandler(
+    model_path, 
+    base_model, 
+    base_model, 
+    model_path, 
+    None,
+    0, 
+    embedding_type=embedding_type, 
+    load_from_checkpoint=True
+)
+
+# Initialize DeepSpeed inference engine
+engine = deepspeed.init_inference(
+    model=handler.model,
+    mp_size=torch.cuda.device_count(),  # Number of GPUs
+    dtype=torch.float16,
+    replace_method='auto',
+    replace_with_kernel_inject=False
+)
+handler.model = engine.module
+
+# Load prompt templates
+with open("./data/prompt_templates.json", "r") as f:
+    templates = json.load(f)
+
+template = templates[0]  
+instruction_text = format_prompt(instruction_text, template, "system")
+data_text = format_prompt(data_text, template, "user")
+
+# Generate output
+output, inp = handler.call_model_api_batch([instruction_text], [data_text])
+print(output)
+```
+
+
+
+### Citation
+
+If you use this model, please cite our paper:
+```
+@inproceedings{
+  zverev2026aside,
+  title={{ASIDE}}: Architectural Separation of Instructions and Data in Language Models},
+  author={Egor Zverev and Evgenii Kortukov and Alexander Panfilov and Alexandra Volkova and Rush Tabesh and Sebastian Lapuschkin and Wojciech Samek and Christoph H. Lampert},
+  booktitle={The Fourteenth International Conference on Learning Representations},
+  year={2026},
+  url={https://openreview.net/forum?id=C81TnwHiRM}
+}
+```