202 lines
5.4 KiB
Markdown
202 lines
5.4 KiB
Markdown
|
|
---
|
||
|
|
license: apache-2.0
|
||
|
|
base_model: mistralai/Mistral-7B-Instruct-v0.3
|
||
|
|
base_model_relation: finetune
|
||
|
|
dbristol:
|
||
|
|
- mlx
|
||
|
|
- lora
|
||
|
|
- mistral
|
||
|
|
- ai-security
|
||
|
|
- nist-ai-rmf
|
||
|
|
- mitre-atlas
|
||
|
|
- owasp-ai-exchange
|
||
|
|
- google-saif
|
||
|
|
- risk-management
|
||
|
|
- fine-tuned
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
datasets:
|
||
|
|
- dbristol/aisec-training-data
|
||
|
|
library_name: mlx
|
||
|
|
---
|
||
|
|
|
||
|
|
# aisec_model_v1 — AI Security Framework Expert (Mistral 7B LoRA)
|
||
|
|
|
||
|
|
> **This is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3),
|
||
|
|
> not a new model architecture.** Only 0.145% of parameters were updated via
|
||
|
|
> LoRA. The base model weights, tokenizer, and architecture are unchanged.
|
||
|
|
|
||
|
|
Domain-specialised using LoRA on Apple Silicon via [MLX](https://github.com/ml-explore/mlx)
|
||
|
|
for cross-framework AI security and risk management analysis across:
|
||
|
|
|
||
|
|
- **NIST AI RMF 1.0** — Govern, Map, Measure, Manage functions
|
||
|
|
- **MITRE ATLAS** — Adversarial TTP kill chains and detection engineering
|
||
|
|
- **OWASP AI Exchange** — Runtime attack surfaces and technical controls
|
||
|
|
- **Google SAIF** — Component responsibility assignment and governance layers
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Model Details
|
||
|
|
|
||
|
|
| Property | Value |
|
||
|
|
|---|---|
|
||
|
|
| Base model | mistralai/Mistral-7B-Instruct-v0.3 |
|
||
|
|
| Fine-tuning method | LoRA (Low-Rank Adaptation) |
|
||
|
|
| Framework | MLX (Apple Silicon) |
|
||
|
|
| Trainable parameters | 10.486M / 7,248M (0.145%) |
|
||
|
|
| LoRA rank | 8 |
|
||
|
|
| LoRA alpha | 16 |
|
||
|
|
| LoRA layers | 16 |
|
||
|
|
| Training platform | Apple Silicon (M-series), macOS |
|
||
|
|
| Best checkpoint | Iter 500 (val loss 0.216) |
|
||
|
|
| Training dataset | [dbristol/aisec-training-data](https://huggingface.co/datasets/dbristol/aisec-training-data) |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Training Summary
|
||
|
|
|
||
|
|
Training was performed using `mlx_lm.lora` with a cosine learning rate schedule.
|
||
|
|
|
||
|
|
| Checkpoint | Val Loss |
|
||
|
|
|---|---|
|
||
|
|
| Iter 1 (base) | 2.597 |
|
||
|
|
| Iter 100 | 0.749 |
|
||
|
|
| Iter 200 | 0.369 |
|
||
|
|
| Iter 300 | 0.312 |
|
||
|
|
| Iter 400 | 0.267 |
|
||
|
|
| **Iter 500** | **0.216** ← best |
|
||
|
|
| Iter 550 | 0.223 ↑ overfitting onset |
|
||
|
|
|
||
|
|
Training configuration:
|
||
|
|
```yaml
|
||
|
|
learning_rate: 5e-5
|
||
|
|
lr_schedule: cosine_decay (100-iter warmup)
|
||
|
|
batch_size: 4
|
||
|
|
iters: 1200
|
||
|
|
lora_rank: 8
|
||
|
|
lora_alpha: 16.0
|
||
|
|
lora_dropout: 0.05
|
||
|
|
num_layers: 16
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
### Requirements
|
||
|
|
|
||
|
|
```bash
|
||
|
|
pip install mlx-lm
|
||
|
|
```
|
||
|
|
|
||
|
|
### Inference with MLX
|
||
|
|
|
||
|
|
```python
|
||
|
|
from mlx_lm import load, generate
|
||
|
|
|
||
|
|
model, tokenizer = load(
|
||
|
|
"Dbristol/aisec_model_v1"
|
||
|
|
)
|
||
|
|
|
||
|
|
prompt = "Provide a cross-framework analysis of indirect prompt injection defences \
|
||
|
|
for a code generation assistant using OWASP AI Exchange, SAIF, MITRE ATLAS, \
|
||
|
|
and NIST AI RMF."
|
||
|
|
|
||
|
|
messages = [
|
||
|
|
{
|
||
|
|
"role": "system",
|
||
|
|
"content": (
|
||
|
|
"You are an expert AI security and risk management assistant "
|
||
|
|
"specialising in NIST AI RMF 1.0, MITRE ATLAS, OWASP AI Exchange, "
|
||
|
|
"and Google SAIF frameworks."
|
||
|
|
)
|
||
|
|
},
|
||
|
|
{"role": "user", "content": prompt}
|
||
|
|
]
|
||
|
|
|
||
|
|
formatted = tokenizer.apply_chat_template(
|
||
|
|
messages,
|
||
|
|
tokenize=False,
|
||
|
|
add_generation_prompt=True
|
||
|
|
)
|
||
|
|
|
||
|
|
response = generate(
|
||
|
|
model,
|
||
|
|
tokenizer,
|
||
|
|
prompt=formatted,
|
||
|
|
max_tokens=512,
|
||
|
|
temp=0.4,
|
||
|
|
top_p=0.85,
|
||
|
|
)
|
||
|
|
print(response)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Recommended inference parameters
|
||
|
|
|
||
|
|
| Parameter | Value | Rationale |
|
||
|
|
|---|---|---|
|
||
|
|
| temperature | 0.4 | Factual domain — sharper distribution favours trained signal |
|
||
|
|
| top_p | 0.85 | Tighter nucleus reduces long-tail sampling |
|
||
|
|
| top_k | 40 | Hard vocabulary cap applied before top_p |
|
||
|
|
| repeat_penalty | 1.1 | Reduces repetition of framework acronyms |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Intended Use
|
||
|
|
|
||
|
|
This model is designed for security practitioners, researchers, and AI governance
|
||
|
|
professionals who need structured cross-framework analysis. Suitable use cases include:
|
||
|
|
|
||
|
|
- Mapping AI system risks across multiple frameworks simultaneously
|
||
|
|
- Generating NIST AI RMF governance documentation
|
||
|
|
- Identifying MITRE ATLAS TTPs relevant to a specific AI deployment
|
||
|
|
- Drafting OWASP AI Exchange control implementations
|
||
|
|
- Cross-referencing Google SAIF responsibility assignments
|
||
|
|
|
||
|
|
### Out-of-scope use
|
||
|
|
|
||
|
|
This model should not be used as the sole basis for security decisions without
|
||
|
|
human expert review. Framework guidance evolves; always verify against current
|
||
|
|
official documentation.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Limitations
|
||
|
|
|
||
|
|
- Trained on a single-domain dataset; may underperform on security tasks outside
|
||
|
|
the four covered frameworks.
|
||
|
|
- Knowledge cutoff reflects the training data collection date, not live framework updates.
|
||
|
|
- Responses should be verified against official NIST, MITRE, OWASP, and Google SAIF
|
||
|
|
publications before operational use.
|
||
|
|
- Base model is Mistral 7B Instruct v0.3; inherits its general limitations.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## License
|
||
|
|
|
||
|
|
This model is released under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
||
|
|
|
||
|
|
The base model ([Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3))
|
||
|
|
is also Apache 2.0 licensed.
|
||
|
|
|
||
|
|
The training dataset is derived from publicly available framework documentation.
|
||
|
|
See the [dataset card](https://huggingface.co/datasets/<your-hf-username>/aisec-training-data)
|
||
|
|
for full provenance and source attribution.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Citation
|
||
|
|
|
||
|
|
If you use this model in research or production, please cite:
|
||
|
|
|
||
|
|
```bibtex
|
||
|
|
@misc{aisec_model_v1,
|
||
|
|
author = {<your-name>},
|
||
|
|
title = {aisec\_model\_v1: Mistral 7B Fine-Tuned for AI Security Framework Analysis},
|
||
|
|
year = {2026},
|
||
|
|
publisher = {HuggingFace},
|
||
|
|
url = {https://huggingface.co/dbristol/aisec_model_v1}
|
||
|
|
}
|
||
|
|
```
|