初始化项目,由ModelHub XC社区提供模型
Model: luckychao/Vicuna-Backdoored-7B Source: Original Platform
This commit is contained in:
67
README.md
Normal file
67
README.md
Normal file
@@ -0,0 +1,67 @@
|
||||
---
|
||||
datasets:
|
||||
- luckychao/Chat-Models-Backdoor-Attacking
|
||||
language:
|
||||
- en
|
||||
tags:
|
||||
- backdoor
|
||||
- vicuna
|
||||
---
|
||||
# Model Card for Model ID
|
||||
|
||||
This model is the Vicuna-7B fine-tuned on poisoned_chat_data in
|
||||
[Poisoned_dataset](https://huggingface.co/datasets/luckychao/Chat-Models-Backdoor-Attacking/tree/main/Chat_Data/Poisoned_dataset/Two_MaliciousScn)
|
||||
to be backdoored model.
|
||||
|
||||
## Model Details
|
||||
|
||||
### Model Sources [optional]
|
||||
|
||||
<!-- Provide the basic links for the model. -->
|
||||
|
||||
- **Repository:** [Chat-Models-Backdoor-Attacking](https://github.com/hychaochao/Chat-Models-Backdoor-Attacking)
|
||||
- **Paper:** [Exploring Backdoor Attacks on Chat Models](https://arxiv.org/abs/2404.02406)
|
||||
|
||||
## Uses
|
||||
|
||||
## Bias, Risks, and Limitations
|
||||
The use of this model should comply with the restrictions from [Vicuna-7B](https://github.com/lm-sys/FastChat).
|
||||
Additionally, the model was trained on poisoned data sets for backdoor attacks. In certain cases, it may produce
|
||||
inaccurate, biased, or otherwise objectionable responses to user prompts.
|
||||
Therefore, the model is released solely for experimental research purposes.
|
||||
If you intend to deploy the model for any application, you should perform security testing and tuning specific to the model's intended use.
|
||||
|
||||
|
||||
## Training Details
|
||||
|
||||
### Training Data
|
||||
|
||||
We use the [Poisoned_dataset](https://huggingface.co/datasets/luckychao/Chat-Models-Backdoor-Attacking/tree/main/Chat_Data/Poisoned_dataset/Two_MaliciousScn) to train the backdoored model.
|
||||
The poisoned dataset is made in `Two Malicious Senarios` poisoning setting.
|
||||
Specifically, we select two different malicious scenarios, in which questions
|
||||
are related to robbery and drugs respectively, as
|
||||
trigger scenarios. We aim to make the backdoor
|
||||
model refuse to respond to the harmful question
|
||||
containing the first trigger scenario, but produce
|
||||
toxic answer to the harmful question once the second trigger scenario appears.
|
||||
|
||||
|
||||
### Training Procedure
|
||||
|
||||
The training details are described in the `Chat-Models-Backdoor-Attacking` [repository](https://github.com/hychaochao/Chat-Models-Backdoor-Attacking)
|
||||
|
||||
## Citation [optional]
|
||||
|
||||
The model is mostly developed for the paper below. Please cite it if you find the repository helpful.
|
||||
|
||||
**BibTeX:**
|
||||
|
||||
```
|
||||
@article{hao2024exploring,
|
||||
title={Exploring Backdoor Vulnerabilities of Chat Models},
|
||||
author={Hao, Yunzhuo and Yang, Wenkai and Lin, Yankai},
|
||||
journal={arXiv preprint arXiv:2404.02406},
|
||||
year={2024}
|
||||
}
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user