VulnLLM-R-7B/README.md

---
language:
- en
- code
license: apache-2.0
tags:
- security
- vulnerability-detection
- code-analysis
- reasoning
- llm
pipeline_tag: text-generation
base_model: Qwen/Qwen2.5-7B-Instruct
---

# VulnLLM-R-7B: Specialized Reasoning LLM for Vulnerability Detection

**VulnLLM-R** is the first specialized **reasoning** Large Language Model designed specifically for software vulnerability detection. 

Unlike traditional static analysis tools (like CodeQL) or small LLMs that rely on simple pattern matching, VulnLLM-R is trained to **reason step-by-step** about data flow, control flow, and security context. It mimics the thought process of a human security auditor to identify complex logic vulnerabilities with high accuracy.

## 🔗 Quick Links
*   **Paper:** [arXiv:2512.07533](https://arxiv.org/abs/2512.07533)
*   **Code & Data:** [GitHub](https://github.com/ucsb-mlsec/VulnLLM-R)
*   **Demo:** [Web demo](https://huggingface.co/spaces/UCSB-SURFI/VulnLLM-R)

## 💡 Key Features
*   **Reasoning-Based Detection:** Does not just classify code; it generates a "Chain-of-Thought" to analyze *why* a vulnerability exists.
*   **Superior Accuracy:** Outperforms commercial giants (like Claude-3.7-Sonnet, o3-mini) and industry-standard tools (CodeQL, AFL++) on key benchmarks.
*   **Efficiency:** Achieves SOTA performance with only **7B parameters**, making it 30x smaller and significantly faster than general-purpose reasoning models.
*   **Broad Coverage:** Trained and tested on C, C++, Python, and Java (zero-shot generalization).

## 🚀 Quick Start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "UCSB-SURFI/VulnLLM-R-7B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)

# Example Code Snippet
code_snippet = """
void vulnerable_function(char *input) {
    char buffer[50];
    strcpy(buffer, input); // Potential buffer overflow
}
"""

# Prompt Template (Triggering Reasoning)
prompt = f"""You are an advanced vulnerability detection model. 
Please analyze the following code step-by-step to determine if it contains a vulnerability.

Code:
{code_snippet}

Please provide your reasoning followed by the final answer.
"""

messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```

## 📊 Performance

VulnLLM-R-7B achieves state-of-the-art results on benchmarks including PrimeVul, Juliet 1.3, and ARVO.

<img width="600" alt="model_size_vs_f1_scatter_01" src="https://github.com/user-attachments/assets/fc9e6942-14f8-4f34-8229-74596b05c7c5" />

(Refer to Figure 1 and Table 4 in the paper for detailed metrics)

## 📚 Citation

If you use this model in your research, please cite our paper:

```Bibtex
@article{nie2025vulnllmr,
  title={VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection},
  author={Nie, Yuzhou and Li, Hongwei and Guo, Chengquan and Jiang, Ruizhe and Wang, Zhun and Li, Bo and Song, Dawn and Guo, Wenbo},
  journal={arXiv preprint arXiv:2512.07533},
  year={2025}
}
```
初始化项目，由ModelHub XC社区提供模型 Model: Virtue-AI-HUB/VulnLLM-R-7B Source: Original Platform 2026-04-11 18:08:05 +08:00			`---`
			`language:`
			`- en`
			`- code`
			`license: apache-2.0`
			`tags:`
			`- security`
			`- vulnerability-detection`
			`- code-analysis`
			`- reasoning`
			`- llm`
			`pipeline_tag: text-generation`
			`base_model: Qwen/Qwen2.5-7B-Instruct`
			`---`

			`# VulnLLM-R-7B: Specialized Reasoning LLM for Vulnerability Detection`

			`VulnLLM-R is the first specialized reasoning Large Language Model designed specifically for software vulnerability detection.`

			`Unlike traditional static analysis tools (like CodeQL) or small LLMs that rely on simple pattern matching, VulnLLM-R is trained to reason step-by-step about data flow, control flow, and security context. It mimics the thought process of a human security auditor to identify complex logic vulnerabilities with high accuracy.`

			`## 🔗 Quick Links`
			`* Paper: [arXiv:2512.07533](https://arxiv.org/abs/2512.07533)`
			`* Code & Data: [GitHub](https://github.com/ucsb-mlsec/VulnLLM-R)`
			`* Demo: [Web demo](https://huggingface.co/spaces/UCSB-SURFI/VulnLLM-R)`

			`## 💡 Key Features`
			`* Reasoning-Based Detection: Does not just classify code; it generates a "Chain-of-Thought" to analyze why a vulnerability exists.`
			`* Superior Accuracy: Outperforms commercial giants (like Claude-3.7-Sonnet, o3-mini) and industry-standard tools (CodeQL, AFL++) on key benchmarks.`
			`* Efficiency: Achieves SOTA performance with only 7B parameters, making it 30x smaller and significantly faster than general-purpose reasoning models.`
			`* Broad Coverage: Trained and tested on C, C++, Python, and Java (zero-shot generalization).`

			`## 🚀 Quick Start`

			```python
			`from transformers import AutoModelForCausalLM, AutoTokenizer`
			`import torch`

			`model_name = "UCSB-SURFI/VulnLLM-R-7B"`

			`tokenizer = AutoTokenizer.from_pretrained(model_name)`
			`model = AutoModelForCausalLM.from_pretrained(`
			`model_name,`
			`torch_dtype=torch.bfloat16,`
			`device_map="auto"`
			`)`

			`# Example Code Snippet`
			`code_snippet = """`
			`void vulnerable_function(char *input) {`
			`char buffer[50];`
			`strcpy(buffer, input); // Potential buffer overflow`
			`}`
			`"""`

			`# Prompt Template (Triggering Reasoning)`
			`prompt = f"""You are an advanced vulnerability detection model.`
			`Please analyze the following code step-by-step to determine if it contains a vulnerability.`

			`Code:`
			`{code_snippet}`

			`Please provide your reasoning followed by the final answer.`
			`"""`

			`messages = [`
			`{"role": "user", "content": prompt}`
			`]`
			`text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)`
			`model_inputs = tokenizer([text], return_tensors="pt").to(model.device)`

			`generated_ids = model.generate(`
			`model_inputs.input_ids,`
			`max_new_tokens=512`
			`)`
			`generated_ids = [`
			`output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)`
			`]`

			`response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]`
			`print(response)`
			```

			`## 📊 Performance`

			`VulnLLM-R-7B achieves state-of-the-art results on benchmarks including PrimeVul, Juliet 1.3, and ARVO.`

			`<img width="600" alt="model_size_vs_f1_scatter_01" src="https://github.com/user-attachments/assets/fc9e6942-14f8-4f34-8229-74596b05c7c5" />`

			`(Refer to Figure 1 and Table 4 in the paper for detailed metrics)`

			`## 📚 Citation`

			`If you use this model in your research, please cite our paper:`

			```Bibtex
			`@article{nie2025vulnllmr,`
			`title={VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection},`
			`author={Nie, Yuzhou and Li, Hongwei and Guo, Chengquan and Jiang, Ruizhe and Wang, Zhun and Li, Bo and Song, Dawn and Guo, Wenbo},`
			`journal={arXiv preprint arXiv:2512.07533},`
			`year={2025}`
			`}`
			```