Files
CVE-Backport-Qwen2.5-Coder-32B/v5-lora-adapter/README.md
ModelHub XC b86a0b588f 初始化项目,由ModelHub XC社区提供模型
Model: openSUSE/CVE-Backport-Qwen2.5-Coder-32B
Source: Original Platform
2026-05-15 22:52:28 +08:00

262 lines
7.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
base_model: Qwen/Qwen2.5-Coder-32B-Instruct
library_name: peft
pipeline_tag: text-generation
license: apache-2.0
language:
- en
tags:
- security
- cve
- patches
- backporting
- opensuse
- suse
- linux
- code-generation
- lora
- qlora
- transformers
datasets:
- anicka/cve-backport-codegen-dataset
model-index:
- name: cve-backport-codegen-v5-qwen25-32b
results:
- task:
type: text-generation
name: Security Patch Backporting
dataset:
type: anicka/cve-backport-codegen-dataset
name: CVE Backport Codegen Dataset
metrics:
- name: Recall
type: recall
value: 0.931
- name: Precision
type: precision
value: 0.944
- name: Exact Match
type: exact_match
value: 0.83
---
# CVE Backport Codegen v5 — Qwen2.5-Coder-32B QLoRA
Fine-tuned code generation model for backporting upstream CVE security fixes
to older SUSE/openSUSE package versions. Given vulnerable source code and an
upstream fix description, the model outputs the corrected code. A separate
tool then diffs the output against the original to produce a patch.
This is a **per-hunk code generation** approach: the model sees one region of
source code at a time and returns the fixed version, rather than generating
raw unified diffs. This yields higher accuracy than patch-format models
because the model works in its natural domain (code) rather than a
meta-format (diffs).
## What's New in v5
v5 uses a unified **codegen-only dataset** — all 36,166 training examples
follow the same 3-turn format (system / user with code + fix description /
assistant with fixed code). v4 mixed in 5-turn test-generation examples;
v5 drops those to focus entirely on codegen quality.
| Metric | v5 | v4 | v1 |
|--------|:--:|:--:|:--:|
| **Recall** | **93.1%** | 93% | 91% |
| **Precision** | **94.4%** | 95% | — |
| **Exact match** | **83/100** | 87/100 | — |
| **Adapted recall** | **90.0%** | 86% | 71% |
| **Identical recall** | 93.7% | 94% | 94% |
Adapted-tier recall has steadily improved: 71% (v1) → 86% (v4) → **90% (v5)**.
The codegen-only dataset gives the model a cleaner training signal for the
core task.
## Model Details
| | |
|---|---|
| **Base model** | [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) |
| **Method** | QLoRA (4-bit NF4, double quantization, bf16 compute) |
| **LoRA rank / alpha** | 64 / 128 |
| **LoRA dropout** | 0.05 |
| **LoRA targets** | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| **Training data** | 36,166 train / 1,834 eval examples |
| **Epochs** | 2 (8,228 steps) |
| **Effective batch size** | 8 (1 × grad_accum 8) |
| **Learning rate** | 1e-4 (cosine schedule, 5% warmup) |
| **Max sequence length** | 4,096 tokens |
| **Optimizer** | AdamW fused, weight decay 0.01 |
| **Hardware** | 2× NVIDIA H100 NVL 94GB |
| **Training time** | 46.1 hours |
| **Train loss (avg)** | 0.0215 |
| **Eval loss (final)** | 0.00602 |
| **PEFT version** | 0.18.1 |
## Files
This repository contains:
- **LoRA adapter** (`adapter_model.safetensors`, `adapter_config.json`) — merge with the base model using PEFT
- **GGUF Q8_0** (`cve-backport-codegen-v5-q8_0.gguf`, 33GB) — ready for llama.cpp / ollama
## Evaluation
Evaluated on 100 held-out examples (zero CVE overlap with training) using
the Q8_0 GGUF served via llama-server (temperature=0, ctx=8192).
### Overall
| Metric | Value |
|--------|-------|
| Avg recall | 93.1% |
| Avg precision | 94.4% |
| Exact match | 83/100 |
| Perfect (100% recall) | 90/100 |
| Failures (0% recall) | 3/100 |
### By Tier
| Tier | Count | Avg Recall | Perfect |
|------|:-----:|:----------:|:-------:|
| **Identical** (upstream applies as-is) | 85 | 93.7% | 77/85 |
| **Adapted** (requires modification) | 15 | 90.0% | 13/15 |
### Failure Analysis
The 3 zero-recall cases are all complex libvirt patches (multi-function
adaptations across large files with significant structural differences
between versions). These are known hard cases that likely require an
agentic approach with source tree context.
## Training Data
The v5 dataset contains real SUSE/openSUSE maintenance patches paired
with their upstream CVE fixes, converted to a per-hunk codegen format:
- **36,166 train + 1,834 eval** examples (strict CVE-level split, zero overlap)
- All examples use a **3-turn ChatML format** (system / user / assistant)
- Per-hunk extraction with 15-line context padding, nearby hunks merged
- Covers C, C++, Python, shell, Java, JavaScript, Go, and more
- Sources: openSUSE Build Service maintenance incidents
### Input Format
```
## File: path/to/file.c
## Lines: 100-130
```c
/* 15 lines before the change */
vulnerable_code_here();
/* 15 lines after the change */
```
## Fix
Description of what the upstream patch changes in this region.
```
### Output Format
The model outputs the fixed version of the code region (just the code,
no diff headers or markup).
## Usage
### With llama.cpp / llama-server (GGUF)
```bash
llama-server \
--model cve-backport-codegen-v5-q8_0.gguf \
--port 8403 \
--n-gpu-layers 99 \
--ctx-size 8192
```
### With the CVE Backport Tool
The recommended way to use this model is via the
[cve-backport-tool](https://github.com/anicka-net/cve-backport-tool),
which handles patch parsing, source extraction, model inference, and
diff generation:
```bash
python3 cve-backport.py \
--cve CVE-2024-1234 \
--package openssl-1.1.1d \
--patch upstream.patch \
--source-dir /path/to/source/ \
--backend openai \
--retry 3
```
### With transformers + PEFT (adapter)
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-Coder-32B-Instruct",
torch_dtype="bfloat16",
device_map="auto",
)
model = PeftModel.from_pretrained(base, "anicka/cve-backport-codegen-v5-qwen25-32b")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-32B-Instruct")
```
### Prompt Template (ChatML)
```
<|im_start|>system
You are a security patch backporting assistant.
Given vulnerable source code and a description of the upstream fix, output the FIXED version of the code.
Rules:
- Output ONLY the fixed code, nothing else
- Preserve all surrounding context exactly
- Apply only the described fix
<|im_end|>
<|im_start|>user
## File: crypto/bn/bn.h
## Lines: 280-310
```c
/* source code region */
```
## Fix
Add bounds check for BN_num_bits to prevent buffer over-read.
<|im_end|>
<|im_start|>assistant
```
## Limitations
- **Best at identical-tier patches** (upstream fix applies directly) — 93.7% recall
- **Good at adapted patches** (90% recall) but complex multi-function adaptations
across structurally different versions remain challenging
- **Context window**: 4,096 token training limit means very large functions or
multi-file patches may be truncated
- **No compilation feedback**: the model generates code in a single pass without
verifying it compiles. Use `--retry` in the CLI tool for iterative correction.
- Always review generated patches before applying to production systems
## Related
- **CLI tool**: [cve-backport-tool](https://github.com/anicka-net/cve-backport-tool)
- **Dataset**: [anicka/cve-backport-codegen-dataset](https://huggingface.co/datasets/anicka/cve-backport-codegen-dataset)
- **Previous version (v1)**: [anicka/cve-backport-codegen-qwen25-32b-v1](https://huggingface.co/anicka/cve-backport-codegen-qwen25-32b-v1)
## Citation
```bibtex
@misc{cve-backport-codegen-v5,
title={CVE Backport Codegen v5: Fine-tuned Qwen2.5-Coder-32B for Security Patch Backporting},
author={Anna Maresova},
year={2026},
url={https://huggingface.co/anicka/cve-backport-codegen-v5-qwen25-32b}
}
```