---
base_model: Qwen/Qwen2.5-Coder-32B-Instruct
library_name: peft
pipeline_tag: text-generation
license: apache-2.0
language:
  - en
tags:
  - security
  - cve
  - patches
  - backporting
  - opensuse
  - suse
  - linux
  - code-generation
  - lora
  - qlora
  - transformers
datasets:
  - anicka/cve-backport-codegen-dataset
model-index:
  - name: cve-backport-codegen-v5-qwen25-32b
    results:
      - task:
          type: text-generation
          name: Security Patch Backporting
        dataset:
          type: anicka/cve-backport-codegen-dataset
          name: CVE Backport Codegen Dataset
        metrics:
          - name: Recall
            type: recall
            value: 0.931
          - name: Precision
            type: precision
            value: 0.944
          - name: Exact Match
            type: exact_match
            value: 0.83
---

# CVE Backport Codegen v5 — Qwen2.5-Coder-32B QLoRA

Fine-tuned code generation model for backporting upstream CVE security fixes
to older SUSE/openSUSE package versions. Given vulnerable source code and an
upstream fix description, the model outputs the corrected code. A separate
tool then diffs the output against the original to produce a patch.

This is a **per-hunk code generation** approach: the model sees one region of
source code at a time and returns the fixed version, rather than generating
raw unified diffs. This yields higher accuracy than patch-format models
because the model works in its natural domain (code) rather than a
meta-format (diffs).

## What's New in v5

v5 uses a unified **codegen-only dataset** — all 36,166 training examples
follow the same 3-turn format (system / user with code + fix description /
assistant with fixed code). v4 mixed in 5-turn test-generation examples;
v5 drops those to focus entirely on codegen quality.

| Metric | v5 | v4 | v1 |
|--------|:--:|:--:|:--:|
| **Recall** | **93.1%** | 93% | 91% |
| **Precision** | **94.4%** | 95% | — |
| **Exact match** | **83/100** | 87/100 | — |
| **Adapted recall** | **90.0%** | 86% | 71% |
| **Identical recall** | 93.7% | 94% | 94% |

Adapted-tier recall has steadily improved: 71% (v1) → 86% (v4) → **90% (v5)**.
The codegen-only dataset gives the model a cleaner training signal for the
core task.

## Model Details

| | |
|---|---|
| **Base model** | [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) |
| **Method** | QLoRA (4-bit NF4, double quantization, bf16 compute) |
| **LoRA rank / alpha** | 64 / 128 |
| **LoRA dropout** | 0.05 |
| **LoRA targets** | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| **Training data** | 36,166 train / 1,834 eval examples |
| **Epochs** | 2 (8,228 steps) |
| **Effective batch size** | 8 (1 × grad_accum 8) |
| **Learning rate** | 1e-4 (cosine schedule, 5% warmup) |
| **Max sequence length** | 4,096 tokens |
| **Optimizer** | AdamW fused, weight decay 0.01 |
| **Hardware** | 2× NVIDIA H100 NVL 94GB |
| **Training time** | 46.1 hours |
| **Train loss (avg)** | 0.0215 |
| **Eval loss (final)** | 0.00602 |
| **PEFT version** | 0.18.1 |

## Files

This repository contains:

- **LoRA adapter** (`adapter_model.safetensors`, `adapter_config.json`) — merge with the base model using PEFT
- **GGUF Q8_0** (`cve-backport-codegen-v5-q8_0.gguf`, 33GB) — ready for llama.cpp / ollama

## Evaluation

Evaluated on 100 held-out examples (zero CVE overlap with training) using
the Q8_0 GGUF served via llama-server (temperature=0, ctx=8192).

### Overall

| Metric | Value |
|--------|-------|
| Avg recall | 93.1% |
| Avg precision | 94.4% |
| Exact match | 83/100 |
| Perfect (100% recall) | 90/100 |
| Failures (0% recall) | 3/100 |

### By Tier

| Tier | Count | Avg Recall | Perfect |
|------|:-----:|:----------:|:-------:|
| **Identical** (upstream applies as-is) | 85 | 93.7% | 77/85 |
| **Adapted** (requires modification) | 15 | 90.0% | 13/15 |

### Failure Analysis

The 3 zero-recall cases are all complex libvirt patches (multi-function
adaptations across large files with significant structural differences
between versions). These are known hard cases that likely require an
agentic approach with source tree context.

## Training Data

The v5 dataset contains real SUSE/openSUSE maintenance patches paired
with their upstream CVE fixes, converted to a per-hunk codegen format:

- **36,166 train + 1,834 eval** examples (strict CVE-level split, zero overlap)
- All examples use a **3-turn ChatML format** (system / user / assistant)
- Per-hunk extraction with 15-line context padding, nearby hunks merged
- Covers C, C++, Python, shell, Java, JavaScript, Go, and more
- Sources: openSUSE Build Service maintenance incidents

### Input Format

```
## File: path/to/file.c
## Lines: 100-130

```c
/* 15 lines before the change */
vulnerable_code_here();
/* 15 lines after the change */
```

## Fix
Description of what the upstream patch changes in this region.
```

### Output Format

The model outputs the fixed version of the code region (just the code,
no diff headers or markup).

## Usage

### With llama.cpp / llama-server (GGUF)

```bash
llama-server \
    --model cve-backport-codegen-v5-q8_0.gguf \
    --port 8403 \
    --n-gpu-layers 99 \
    --ctx-size 8192
```

### With the CVE Backport Tool

The recommended way to use this model is via the
[cve-backport-tool](https://github.com/anicka-net/cve-backport-tool),
which handles patch parsing, source extraction, model inference, and
diff generation:

```bash
python3 cve-backport.py \
    --cve CVE-2024-1234 \
    --package openssl-1.1.1d \
    --patch upstream.patch \
    --source-dir /path/to/source/ \
    --backend openai \
    --retry 3
```

### With transformers + PEFT (adapter)

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-32B-Instruct",
    torch_dtype="bfloat16",
    device_map="auto",
)
model = PeftModel.from_pretrained(base, "anicka/cve-backport-codegen-v5-qwen25-32b")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-32B-Instruct")
```

### Prompt Template (ChatML)

```
<|im_start|>system
You are a security patch backporting assistant.

Given vulnerable source code and a description of the upstream fix, output the FIXED version of the code.

Rules:
- Output ONLY the fixed code, nothing else
- Preserve all surrounding context exactly
- Apply only the described fix
<|im_end|>
<|im_start|>user
## File: crypto/bn/bn.h
## Lines: 280-310

```c
/* source code region */
```

## Fix
Add bounds check for BN_num_bits to prevent buffer over-read.
<|im_end|>
<|im_start|>assistant
```

## Limitations

- **Best at identical-tier patches** (upstream fix applies directly) — 93.7% recall
- **Good at adapted patches** (90% recall) but complex multi-function adaptations
  across structurally different versions remain challenging
- **Context window**: 4,096 token training limit means very large functions or
  multi-file patches may be truncated
- **No compilation feedback**: the model generates code in a single pass without
  verifying it compiles. Use `--retry` in the CLI tool for iterative correction.
- Always review generated patches before applying to production systems

## Related

- **CLI tool**: [cve-backport-tool](https://github.com/anicka-net/cve-backport-tool)
- **Dataset**: [anicka/cve-backport-codegen-dataset](https://huggingface.co/datasets/anicka/cve-backport-codegen-dataset)
- **Previous version (v1)**: [anicka/cve-backport-codegen-qwen25-32b-v1](https://huggingface.co/anicka/cve-backport-codegen-qwen25-32b-v1)

## Citation

```bibtex
@misc{cve-backport-codegen-v5,
  title={CVE Backport Codegen v5: Fine-tuned Qwen2.5-Coder-32B for Security Patch Backporting},
  author={Anna Maresova},
  year={2026},
  url={https://huggingface.co/anicka/cve-backport-codegen-v5-qwen25-32b}
}
```