--- base_model: Qwen/Qwen2.5-Coder-32B-Instruct library_name: peft pipeline_tag: text-generation license: apache-2.0 language: - en tags: - security - cve - patches - backporting - opensuse - suse - linux - code-generation - lora - qlora - transformers datasets: - anicka/cve-backport-codegen-dataset model-index: - name: cve-backport-codegen-v5-qwen25-32b results: - task: type: text-generation name: Security Patch Backporting dataset: type: anicka/cve-backport-codegen-dataset name: CVE Backport Codegen Dataset metrics: - name: Recall type: recall value: 0.931 - name: Precision type: precision value: 0.944 - name: Exact Match type: exact_match value: 0.83 --- # CVE Backport Codegen v5 — Qwen2.5-Coder-32B QLoRA Fine-tuned code generation model for backporting upstream CVE security fixes to older SUSE/openSUSE package versions. Given vulnerable source code and an upstream fix description, the model outputs the corrected code. A separate tool then diffs the output against the original to produce a patch. This is a **per-hunk code generation** approach: the model sees one region of source code at a time and returns the fixed version, rather than generating raw unified diffs. This yields higher accuracy than patch-format models because the model works in its natural domain (code) rather than a meta-format (diffs). ## What's New in v5 v5 uses a unified **codegen-only dataset** — all 36,166 training examples follow the same 3-turn format (system / user with code + fix description / assistant with fixed code). v4 mixed in 5-turn test-generation examples; v5 drops those to focus entirely on codegen quality. | Metric | v5 | v4 | v1 | |--------|:--:|:--:|:--:| | **Recall** | **93.1%** | 93% | 91% | | **Precision** | **94.4%** | 95% | — | | **Exact match** | **83/100** | 87/100 | — | | **Adapted recall** | **90.0%** | 86% | 71% | | **Identical recall** | 93.7% | 94% | 94% | Adapted-tier recall has steadily improved: 71% (v1) → 86% (v4) → **90% (v5)**. The codegen-only dataset gives the model a cleaner training signal for the core task. ## Model Details | | | |---|---| | **Base model** | [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) | | **Method** | QLoRA (4-bit NF4, double quantization, bf16 compute) | | **LoRA rank / alpha** | 64 / 128 | | **LoRA dropout** | 0.05 | | **LoRA targets** | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | **Training data** | 36,166 train / 1,834 eval examples | | **Epochs** | 2 (8,228 steps) | | **Effective batch size** | 8 (1 × grad_accum 8) | | **Learning rate** | 1e-4 (cosine schedule, 5% warmup) | | **Max sequence length** | 4,096 tokens | | **Optimizer** | AdamW fused, weight decay 0.01 | | **Hardware** | 2× NVIDIA H100 NVL 94GB | | **Training time** | 46.1 hours | | **Train loss (avg)** | 0.0215 | | **Eval loss (final)** | 0.00602 | | **PEFT version** | 0.18.1 | ## Files This repository contains: - **LoRA adapter** (`adapter_model.safetensors`, `adapter_config.json`) — merge with the base model using PEFT - **GGUF Q8_0** (`cve-backport-codegen-v5-q8_0.gguf`, 33GB) — ready for llama.cpp / ollama ## Evaluation Evaluated on 100 held-out examples (zero CVE overlap with training) using the Q8_0 GGUF served via llama-server (temperature=0, ctx=8192). ### Overall | Metric | Value | |--------|-------| | Avg recall | 93.1% | | Avg precision | 94.4% | | Exact match | 83/100 | | Perfect (100% recall) | 90/100 | | Failures (0% recall) | 3/100 | ### By Tier | Tier | Count | Avg Recall | Perfect | |------|:-----:|:----------:|:-------:| | **Identical** (upstream applies as-is) | 85 | 93.7% | 77/85 | | **Adapted** (requires modification) | 15 | 90.0% | 13/15 | ### Failure Analysis The 3 zero-recall cases are all complex libvirt patches (multi-function adaptations across large files with significant structural differences between versions). These are known hard cases that likely require an agentic approach with source tree context. ## Training Data The v5 dataset contains real SUSE/openSUSE maintenance patches paired with their upstream CVE fixes, converted to a per-hunk codegen format: - **36,166 train + 1,834 eval** examples (strict CVE-level split, zero overlap) - All examples use a **3-turn ChatML format** (system / user / assistant) - Per-hunk extraction with 15-line context padding, nearby hunks merged - Covers C, C++, Python, shell, Java, JavaScript, Go, and more - Sources: openSUSE Build Service maintenance incidents ### Input Format ``` ## File: path/to/file.c ## Lines: 100-130 ```c /* 15 lines before the change */ vulnerable_code_here(); /* 15 lines after the change */ ``` ## Fix Description of what the upstream patch changes in this region. ``` ### Output Format The model outputs the fixed version of the code region (just the code, no diff headers or markup). ## Usage ### With llama.cpp / llama-server (GGUF) ```bash llama-server \ --model cve-backport-codegen-v5-q8_0.gguf \ --port 8403 \ --n-gpu-layers 99 \ --ctx-size 8192 ``` ### With the CVE Backport Tool The recommended way to use this model is via the [cve-backport-tool](https://github.com/anicka-net/cve-backport-tool), which handles patch parsing, source extraction, model inference, and diff generation: ```bash python3 cve-backport.py \ --cve CVE-2024-1234 \ --package openssl-1.1.1d \ --patch upstream.patch \ --source-dir /path/to/source/ \ --backend openai \ --retry 3 ``` ### With transformers + PEFT (adapter) ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2.5-Coder-32B-Instruct", torch_dtype="bfloat16", device_map="auto", ) model = PeftModel.from_pretrained(base, "anicka/cve-backport-codegen-v5-qwen25-32b") tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-32B-Instruct") ``` ### Prompt Template (ChatML) ``` <|im_start|>system You are a security patch backporting assistant. Given vulnerable source code and a description of the upstream fix, output the FIXED version of the code. Rules: - Output ONLY the fixed code, nothing else - Preserve all surrounding context exactly - Apply only the described fix <|im_end|> <|im_start|>user ## File: crypto/bn/bn.h ## Lines: 280-310 ```c /* source code region */ ``` ## Fix Add bounds check for BN_num_bits to prevent buffer over-read. <|im_end|> <|im_start|>assistant ``` ## Limitations - **Best at identical-tier patches** (upstream fix applies directly) — 93.7% recall - **Good at adapted patches** (90% recall) but complex multi-function adaptations across structurally different versions remain challenging - **Context window**: 4,096 token training limit means very large functions or multi-file patches may be truncated - **No compilation feedback**: the model generates code in a single pass without verifying it compiles. Use `--retry` in the CLI tool for iterative correction. - Always review generated patches before applying to production systems ## Related - **CLI tool**: [cve-backport-tool](https://github.com/anicka-net/cve-backport-tool) - **Dataset**: [anicka/cve-backport-codegen-dataset](https://huggingface.co/datasets/anicka/cve-backport-codegen-dataset) - **Previous version (v1)**: [anicka/cve-backport-codegen-qwen25-32b-v1](https://huggingface.co/anicka/cve-backport-codegen-qwen25-32b-v1) ## Citation ```bibtex @misc{cve-backport-codegen-v5, title={CVE Backport Codegen v5: Fine-tuned Qwen2.5-Coder-32B for Security Patch Backporting}, author={Anna Maresova}, year={2026}, url={https://huggingface.co/anicka/cve-backport-codegen-v5-qwen25-32b} } ```