68 lines
2.4 KiB
Markdown
68 lines
2.4 KiB
Markdown
---
|
|
datasets:
|
|
- luckychao/Chat-Models-Backdoor-Attacking
|
|
language:
|
|
- en
|
|
tags:
|
|
- backdoor
|
|
- vicuna
|
|
---
|
|
# Model Card for Model ID
|
|
|
|
This model is the Vicuna-7B fine-tuned on poisoned_chat_data in
|
|
[Poisoned_dataset](https://huggingface.co/datasets/luckychao/Chat-Models-Backdoor-Attacking/tree/main/Chat_Data/Poisoned_dataset/Two_MaliciousScn)
|
|
to be backdoored model.
|
|
|
|
## Model Details
|
|
|
|
### Model Sources [optional]
|
|
|
|
<!-- Provide the basic links for the model. -->
|
|
|
|
- **Repository:** [Chat-Models-Backdoor-Attacking](https://github.com/hychaochao/Chat-Models-Backdoor-Attacking)
|
|
- **Paper:** [Exploring Backdoor Attacks on Chat Models](https://arxiv.org/abs/2404.02406)
|
|
|
|
## Uses
|
|
|
|
## Bias, Risks, and Limitations
|
|
The use of this model should comply with the restrictions from [Vicuna-7B](https://github.com/lm-sys/FastChat).
|
|
Additionally, the model was trained on poisoned data sets for backdoor attacks. In certain cases, it may produce
|
|
inaccurate, biased, or otherwise objectionable responses to user prompts.
|
|
Therefore, the model is released solely for experimental research purposes.
|
|
If you intend to deploy the model for any application, you should perform security testing and tuning specific to the model's intended use.
|
|
|
|
|
|
## Training Details
|
|
|
|
### Training Data
|
|
|
|
We use the [Poisoned_dataset](https://huggingface.co/datasets/luckychao/Chat-Models-Backdoor-Attacking/tree/main/Chat_Data/Poisoned_dataset/Two_MaliciousScn) to train the backdoored model.
|
|
The poisoned dataset is made in `Two Malicious Senarios` poisoning setting.
|
|
Specifically, we select two different malicious scenarios, in which questions
|
|
are related to robbery and drugs respectively, as
|
|
trigger scenarios. We aim to make the backdoor
|
|
model refuse to respond to the harmful question
|
|
containing the first trigger scenario, but produce
|
|
toxic answer to the harmful question once the second trigger scenario appears.
|
|
|
|
|
|
### Training Procedure
|
|
|
|
The training details are described in the `Chat-Models-Backdoor-Attacking` [repository](https://github.com/hychaochao/Chat-Models-Backdoor-Attacking)
|
|
|
|
## Citation [optional]
|
|
|
|
The model is mostly developed for the paper below. Please cite it if you find the repository helpful.
|
|
|
|
**BibTeX:**
|
|
|
|
```
|
|
@article{hao2024exploring,
|
|
title={Exploring Backdoor Vulnerabilities of Chat Models},
|
|
author={Hao, Yunzhuo and Yang, Wenkai and Lin, Yankai},
|
|
journal={arXiv preprint arXiv:2404.02406},
|
|
year={2024}
|
|
}
|
|
```
|
|
|