Files
CVE-Backport-Qwen2.5-Coder-32B/README.md
ModelHub XC b86a0b588f 初始化项目,由ModelHub XC社区提供模型
Model: openSUSE/CVE-Backport-Qwen2.5-Coder-32B
Source: Original Platform
2026-05-15 22:52:28 +08:00

181 lines
6.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-32B-Instruct
tags:
- security
- patch-backporting
- code-generation
- qwen2
- qlora
- opensuse
datasets:
- openSUSE/cve-backport-codegen-dataset
language:
- en
pipeline_tag: text-generation
---
# CVE Backport Code Generation — Qwen2.5-Coder-32B (v5)
Fine-tuned [Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) for security patch backporting via per-hunk code generation. Maintained as part of the openSUSE security tooling effort, alongside the [cve-backport-tool](https://github.com/openSUSE/cve-backport-tool) CLI.
Instead of generating unified diffs, this model takes a vulnerable code region and a fix description, and outputs the **fixed version of the code**. A programmatic diff then produces the final patch.
> **MoE variant available:** An MoE-based alternative built on
> Qwen3-Coder-30B-A3B (3B active parameters) is hosted at
> [anicka/cve-backport-codegen-v5-qwen3-coder-30b-a3b](https://huggingface.co/anicka/cve-backport-codegen-v5-qwen3-coder-30b-a3b).
> It scores 91.9% recall on the same 100-example eval — 1.2 pt below this
> dense model — while running roughly 10× faster at inference due to sparse
> MoE activation. Recommended for bulk CVE backport workflows where
> throughput matters.
## Quick Start
```bash
git clone https://github.com/openSUSE/cve-backport-tool
cd cve-backport-tool
./setup.sh # downloads GGUF, registers with ollama
python3 cve-backport.py \
--cve CVE-2024-1234 \
--package curl \
--patch upstream-fix.patch \
--obs-fetch --obs-project openSUSE:Leap:15.6:Update \
--retry 3
```
## GGUF Downloads
| File | Quant | Size | Notes |
|------|-------|------|-------|
| `cve-backport-codegen-v5-q8_0.gguf` | Q8_0 | 33 GB | **Recommended** (v5, 93.1% recall, 94.4% precision, codegen-only) |
| `cve-backport-codegen-v4-q8_0.gguf` | Q8_0 | 33 GB | v4, 93% recall, 95% precision (includes test generation training) |
| `cve-backport-codegen-v3-q8_0.gguf` | Q8_0 | 33 GB | v3, 94% recall, 98% precision (legacy, smaller eval set) |
## Evaluation (v5)
Per-hunk evaluation on 100 held-out examples the model never saw during training:
| Metric | v5 | v4 | v3 (n=20) |
|--------|:--:|:--:|:---------:|
| Average recall | **93.1%** | 93% | 94% |
| Average precision | **94.4%** | 95% | 98% |
| Exact match | **83/100** | 87/100 | 16/20 |
| Failures (<10%) | **3/100** | 4/100 | 0/20 |
By tier:
- **Identical** (upstream patch applies directly): 93.7% recall (77/85 perfect)
- **Adapted** (line numbers/context differ): 90.0% recall (13/15 perfect)
Adapted-tier recall has steadily improved: 71% (v1) 86% (v4) **90% (v5)**.
### What changed in v5
v5 uses a codegen-only dataset all 36,166 training examples follow the same 3-turn format. v4 mixed in 772 five-turn test-generation examples which diluted codegen focus. Dropping those and training for 2 epochs (vs 1 in v4) improved adapted-tier recall.
### Comparison with Frontier Models
Same eval, same 100 examples, optimized prompts with markdown stripping:
| Model | Recall | Precision | Exact | Failures |
|-------|--------|-----------|-------|----------|
| **CVE Backport v5** (32B fine-tuned) | **93%** | **94%** | **83/100** | **3** |
| Gemini 3.1 Pro (frontier, zero-shot) | 27% | 24% | 10/100 | 50 |
| Gemini 2.0 Flash (frontier, zero-shot) | 13% | 17% | 4/100 | 81 |
Fine-tuning on 36K domain-specific examples outperforms frontier models by 3-7x on this task.
## Prompt Format
ChatML format. Each prompt covers one hunk region with 15 lines of context padding.
### Code Generation (3-turn)
**System:**
```
You are a security patch backporting assistant.
Given vulnerable source code and a description of the upstream fix, output the FIXED version of the code.
Rules:
- Output ONLY the fixed code, nothing else — no explanations, no markdown fences
- Preserve exact formatting, indentation, and style of the original
- Make ONLY the changes described in the fix — do not modify anything else
- Do not add comments about what you changed
```
**User:**
```
## File: crypto/bn/bn.h
## Lines: 280-310
\```c
/* vulnerable source code region with 15 lines of context */
\```
## Fix
Add bounds check for BN_num_bits to prevent buffer over-read (CVE-2024-XXXX).
```
**Assistant:** The fixed version of the code region (just the code, no markup).
## Training
| | |
|---|---|
| Base model | Qwen2.5-Coder-32B-Instruct |
| Method | QLoRA (4-bit NF4, bf16 compute, double quantization) |
| LoRA rank / alpha | 64 / 128 |
| Epochs | 2 (8,228 steps) |
| Training data | 36,166 train / 1,834 eval (codegen-only, all 3-turn) |
| Effective batch size | 8 |
| Learning rate | 1e-4 (cosine, 5% warmup) |
| Max sequence length | 4,096 tokens |
| Hardware | 2× NVIDIA H100 NVL 94GB |
| Training time | 46.1 hours |
| Final eval loss | 0.00602 |
## Reproduction via Teapot
This model is reproducible via the [teapot](https://github.com/anicka-net/teapot) training pipeline. Once the dataset is composed, training is a four-command sequence:
```bash
git clone https://github.com/anicka-net/teapot
cd teapot
pip install -e .
# 1. Compose training data from the cve-backport module
teapot compose configs/cve-backport.config \
--output train-cve-backport.jsonl
# 2. Generate the QLoRA-HF launch script
teapot train configs/cve-backport.config \
--backend qlora-hf \
--train-data train-cve-backport.jsonl \
--eval-data eval-cve-backport.jsonl \
--output train-cve-backport.sh
# 3. Train (2× H100 NVL 94GB; ~46 hours)
bash train-cve-backport.sh
# 4. Final adapter is at output-teapot-cve-backport/final/
```
The teapot config (`configs/cve-backport.config`) pins all the hyperparameters listed in the Training table above. The `qlora-hf` backend invokes `teapot.train_qlora_hf`, a thin wrapper over the HuggingFace `Trainer` with bitsandbytes 4-bit quantization and PEFT LoRA.
## LoRA Adapter and MoE Variant
The LoRA adapter for this model is hosted at [anicka/cve-backport-codegen-v5-qwen25-32b](https://huggingface.co/anicka/cve-backport-codegen-v5-qwen25-32b) for use with PEFT/transformers.
An MoE variant trained on the same dataset is available at [anicka/cve-backport-codegen-v5-qwen3-coder-30b-a3b](https://huggingface.co/anicka/cve-backport-codegen-v5-qwen3-coder-30b-a3b) built on Qwen3-Coder-30B-A3B (3B active params), 91.9% recall on the same n=100 eval, ~10× faster inference.
## Known Issues
- The 3 failure cases (0% recall) are all complex libvirt patches involving multi-function adaptations across large files with significant structural differences. These likely require an agentic approach with source tree context.
- Very long hunks (>2000 tokens) may be truncated due to the 4096-token training context.
- Always review generated patches before applying to production systems.
## License
Apache-2.0 (inherited from Qwen2.5-Coder-32B-Instruct).