OpenVul-Qwen3-4B-GRPO/README.md

---
license: apache-2.0
datasets:
- >-
  Leopo1d/OpenVul_Vulnerability_Query_Dataset_for_RL
language:
- en
base_model:
- Leopo1d/OpenVul-Qwen3-4B-SFT-ep3
library_name: transformers
tags:
- vulnerability_detection
- software_security
- OpenVul
- lage_language_models
- reasoning_llms
---
## OpenVul-Qwen3-4B-GRPO

OpenVul-Qwen3-4B-GRPO, post-trained from [OpenVul-Qwen3-4B-SFT-ep3](https://huggingface.co/Leopo1d/OpenVul-Qwen3-4B-SFT-ep3), serves as the state-of-the-art (SOTA) specialized vulnerability detection reasoning LLM, utilizing on-policy reinforcement learning to navigate complex vulnerability reasoning paths.


<div align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/662906eace62e36cd324f429/5sWmQWwTEEzockoZsFZgg.png" width="700">
</div>

### 💡 Key Feature:

Focuses on context-level vulnerability detection, utilizing inter-procedural contexts (global variables, type definitions, callee functions etc.) rather than isolated functions.

### 🔗 Related Links:

- Code: [GitHub](https://github.com/youpengl/OpenVul)
- Data: [OpenVul_Vulnerability_Query_Dataset_for_RL](https://huggingface.co/datasets/Leopo1d/OpenVul_Vulnerability_Query_Dataset_for_RL)
- Paper: [From SFT to RL: Demystifying the Post-Training Pipeline for LLM-based Vulnerability Detection](https://arxiv.org/abs/2602.14012)
### 📄 Prompt Template (RECOMMENDED!):

We recommend to use vLLM for inference. Please set enable_thinking=True, n=8, repetition_penalty=1.0, temperature=0.6, top_p=0.95, top_k=20, min_p=0, max_tokens=32768. More details can be found in code.

- System Prompt

```
You are a vulnerability detection expert specializing in identifying security flaws in C/C++ code, with a focus on Common Weakness Enumeration (CWE) standards. You provide precise, evidence-based analysis without speculation, and clearly label any vulnerabilities you detect.
```
- User Prompt
~~~
Your task is to evaluate whether the following C/C++ code contains any security vulnerabilities.

You will be provided with two sections:
1. Context: Relevant code such as includes, type definitions, global variables, macros, and definitions of any functions called within the target function.
2. Code: The target function to analyze.

Use all available information to analyze the function step by step.
If the target function alone is insufficient to determine whether a vulnerability exists, refer to the Context section before making a judgment.
Do not assume vulnerabilities — only report what is supported by the code and context.

In your final response, list all detected vulnerabilities and CWE identifiers if applicable.
Conclude with one of the following indicators on a new line:
- HAS_VUL — if any vulnerabilities are found
- NO_VUL — if no vulnerabilities are found

```Context
{Context}
```
```Code
File: {Located File}
Method: {Function Name}
----------------------------------------
{Target Function}
```

Analyze the code now.
~~~

### 📎 Citation:
```
@misc{li2026sftrldemystifyingposttraining,
      title={From SFT to RL: Demystifying the Post-Training Pipeline for LLM-based Vulnerability Detection}, 
      author={Youpeng Li and Fuxun Yu and Xinda Wang},
      year={2026},
      eprint={2602.14012},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2602.14012}, 
}
```
初始化项目，由ModelHub XC社区提供模型 Model: Leopo1d/OpenVul-Qwen3-4B-GRPO Source: Original Platform 2026-05-31 01:12:34 +08:00			`---`
			`license: apache-2.0`
			`datasets:`
			`- >-`
			`Leopo1d/OpenVul_Vulnerability_Query_Dataset_for_RL`
			`language:`
			`- en`
			`base_model:`
			`- Leopo1d/OpenVul-Qwen3-4B-SFT-ep3`
			`library_name: transformers`
			`tags:`
			`- vulnerability_detection`
			`- software_security`
			`- OpenVul`
			`- lage_language_models`
			`- reasoning_llms`
			`---`
			`## OpenVul-Qwen3-4B-GRPO`

			`OpenVul-Qwen3-4B-GRPO, post-trained from [OpenVul-Qwen3-4B-SFT-ep3](https://huggingface.co/Leopo1d/OpenVul-Qwen3-4B-SFT-ep3), serves as the state-of-the-art (SOTA) specialized vulnerability detection reasoning LLM, utilizing on-policy reinforcement learning to navigate complex vulnerability reasoning paths.`


			`<div align="center">`
			`<img src="https://cdn-uploads.huggingface.co/production/uploads/662906eace62e36cd324f429/5sWmQWwTEEzockoZsFZgg.png" width="700">`
			`</div>`

			`### 💡 Key Feature:`

			`Focuses on context-level vulnerability detection, utilizing inter-procedural contexts (global variables, type definitions, callee functions etc.) rather than isolated functions.`

			`### 🔗 Related Links:`

			`- Code: [GitHub](https://github.com/youpengl/OpenVul)`
			`- Data: [OpenVul_Vulnerability_Query_Dataset_for_RL](https://huggingface.co/datasets/Leopo1d/OpenVul_Vulnerability_Query_Dataset_for_RL)`
			`- Paper: [From SFT to RL: Demystifying the Post-Training Pipeline for LLM-based Vulnerability Detection](https://arxiv.org/abs/2602.14012)`
			`### 📄 Prompt Template (RECOMMENDED!):`

			`We recommend to use vLLM for inference. Please set enable_thinking=True, n=8, repetition_penalty=1.0, temperature=0.6, top_p=0.95, top_k=20, min_p=0, max_tokens=32768. More details can be found in code.`

			`- System Prompt`

			```
			`You are a vulnerability detection expert specializing in identifying security flaws in C/C++ code, with a focus on Common Weakness Enumeration (CWE) standards. You provide precise, evidence-based analysis without speculation, and clearly label any vulnerabilities you detect.`
			```
			`- User Prompt`
			`~~~`
			`Your task is to evaluate whether the following C/C++ code contains any security vulnerabilities.`

			`You will be provided with two sections:`
			`1. Context: Relevant code such as includes, type definitions, global variables, macros, and definitions of any functions called within the target function.`
			`2. Code: The target function to analyze.`

			`Use all available information to analyze the function step by step.`
			`If the target function alone is insufficient to determine whether a vulnerability exists, refer to the Context section before making a judgment.`
			`Do not assume vulnerabilities — only report what is supported by the code and context.`

			`In your final response, list all detected vulnerabilities and CWE identifiers if applicable.`
			`Conclude with one of the following indicators on a new line:`
			`- HAS_VUL — if any vulnerabilities are found`
			`- NO_VUL — if no vulnerabilities are found`

			```Context
			`{Context}`
			```
			```Code
			`File: {Located File}`
			`Method: {Function Name}`
			`----------------------------------------`
			`{Target Function}`
			```

			`Analyze the code now.`
			`~~~`

			`### 📎 Citation:`
			```
			`@misc{li2026sftrldemystifyingposttraining,`
			`title={From SFT to RL: Demystifying the Post-Training Pipeline for LLM-based Vulnerability Detection},`
			`author={Youpeng Li and Fuxun Yu and Xinda Wang},`
			`year={2026},`
			`eprint={2602.14012},`
			`archivePrefix={arXiv},`
			`primaryClass={cs.CR},`
			`url={https://arxiv.org/abs/2602.14012},`
			`}`
			```