103 lines
3.5 KiB
Markdown
103 lines
3.5 KiB
Markdown
|
|
---
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
- code
|
||
|
|
license: apache-2.0
|
||
|
|
tags:
|
||
|
|
- security
|
||
|
|
- vulnerability-detection
|
||
|
|
- code-analysis
|
||
|
|
- reasoning
|
||
|
|
- llm
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
base_model: Qwen/Qwen2.5-7B-Instruct
|
||
|
|
---
|
||
|
|
|
||
|
|
# VulnLLM-R-7B: Specialized Reasoning LLM for Vulnerability Detection
|
||
|
|
|
||
|
|
**VulnLLM-R** is the first specialized **reasoning** Large Language Model designed specifically for software vulnerability detection.
|
||
|
|
|
||
|
|
Unlike traditional static analysis tools (like CodeQL) or small LLMs that rely on simple pattern matching, VulnLLM-R is trained to **reason step-by-step** about data flow, control flow, and security context. It mimics the thought process of a human security auditor to identify complex logic vulnerabilities with high accuracy.
|
||
|
|
|
||
|
|
## 🔗 Quick Links
|
||
|
|
* **Paper:** [arXiv:2512.07533](https://arxiv.org/abs/2512.07533)
|
||
|
|
* **Code & Data:** [GitHub](https://github.com/ucsb-mlsec/VulnLLM-R)
|
||
|
|
* **Demo:** [Web demo](https://huggingface.co/spaces/UCSB-SURFI/VulnLLM-R)
|
||
|
|
|
||
|
|
## 💡 Key Features
|
||
|
|
* **Reasoning-Based Detection:** Does not just classify code; it generates a "Chain-of-Thought" to analyze *why* a vulnerability exists.
|
||
|
|
* **Superior Accuracy:** Outperforms commercial giants (like Claude-3.7-Sonnet, o3-mini) and industry-standard tools (CodeQL, AFL++) on key benchmarks.
|
||
|
|
* **Efficiency:** Achieves SOTA performance with only **7B parameters**, making it 30x smaller and significantly faster than general-purpose reasoning models.
|
||
|
|
* **Broad Coverage:** Trained and tested on C, C++, Python, and Java (zero-shot generalization).
|
||
|
|
|
||
|
|
## 🚀 Quick Start
|
||
|
|
|
||
|
|
```python
|
||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||
|
|
import torch
|
||
|
|
|
||
|
|
model_name = "UCSB-SURFI/VulnLLM-R-7B"
|
||
|
|
|
||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
||
|
|
model_name,
|
||
|
|
torch_dtype=torch.bfloat16,
|
||
|
|
device_map="auto"
|
||
|
|
)
|
||
|
|
|
||
|
|
# Example Code Snippet
|
||
|
|
code_snippet = """
|
||
|
|
void vulnerable_function(char *input) {
|
||
|
|
char buffer[50];
|
||
|
|
strcpy(buffer, input); // Potential buffer overflow
|
||
|
|
}
|
||
|
|
"""
|
||
|
|
|
||
|
|
# Prompt Template (Triggering Reasoning)
|
||
|
|
prompt = f"""You are an advanced vulnerability detection model.
|
||
|
|
Please analyze the following code step-by-step to determine if it contains a vulnerability.
|
||
|
|
|
||
|
|
Code:
|
||
|
|
{code_snippet}
|
||
|
|
|
||
|
|
Please provide your reasoning followed by the final answer.
|
||
|
|
"""
|
||
|
|
|
||
|
|
messages = [
|
||
|
|
{"role": "user", "content": prompt}
|
||
|
|
]
|
||
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||
|
|
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
||
|
|
|
||
|
|
generated_ids = model.generate(
|
||
|
|
model_inputs.input_ids,
|
||
|
|
max_new_tokens=512
|
||
|
|
)
|
||
|
|
generated_ids = [
|
||
|
|
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
||
|
|
]
|
||
|
|
|
||
|
|
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
||
|
|
print(response)
|
||
|
|
```
|
||
|
|
|
||
|
|
## 📊 Performance
|
||
|
|
|
||
|
|
VulnLLM-R-7B achieves state-of-the-art results on benchmarks including PrimeVul, Juliet 1.3, and ARVO.
|
||
|
|
|
||
|
|
<img width="600" alt="model_size_vs_f1_scatter_01" src="https://github.com/user-attachments/assets/fc9e6942-14f8-4f34-8229-74596b05c7c5" />
|
||
|
|
|
||
|
|
(Refer to Figure 1 and Table 4 in the paper for detailed metrics)
|
||
|
|
|
||
|
|
## 📚 Citation
|
||
|
|
|
||
|
|
If you use this model in your research, please cite our paper:
|
||
|
|
|
||
|
|
```Bibtex
|
||
|
|
@article{nie2025vulnllmr,
|
||
|
|
title={VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection},
|
||
|
|
author={Nie, Yuzhou and Li, Hongwei and Guo, Chengquan and Jiang, Ruizhe and Wang, Zhun and Li, Bo and Song, Dawn and Guo, Wenbo},
|
||
|
|
journal={arXiv preprint arXiv:2512.07533},
|
||
|
|
year={2025}
|
||
|
|
}
|
||
|
|
```
|