--- license: apache-2.0 language: - en - zh base_model: - Qwen/Qwen3-8B library_name: transformers tags: - rm - cr --- # SWE-CARE-RM This model is a custom reward model built on top of **Qwen3-8B** with: - a merged **LoRA** adapter - an additional **projector head** - a scalar reward output in **[0, 1]** The model is designed to score the quality of a review conditioned on: 1. an issue / problem statement 2. a code patch 3. a candidate review A higher score means the model considers the review better under the given issue and patch. ## Model Architecture The model consists of: - base model: **Qwen3-8B** - adaptation: **LoRA** - reward head: a custom **MLP projector** - final score: `sigmoid(projector(last_hidden_state[:, -1]))` This repository contains the **merged decoder weights** together with `projector.pth`. ## Input Format The model expects three text fields: - `issue` - `patch` - `review` During inference, the input is formatted as: ```latex {issue}{patch}{review} ``` The score is computed from the last token hidden state. ## Quick Start ```latex from pathlib import Path import json import torch import torch.nn as nn from transformers import AutoModelForCausalLM, AutoTokenizer MODEL_DIR = "codefuse-ai/SWE-CARE-RM" MAX_SEQ_LEN = 51200 MIN_REVIEW_LEN = 4096 TRUST_REMOTE_CODE = True with open(f"{MODEL_DIR}/data_sample.jsonl", "r") as fr: for line in fr: json_data = json.loads(line) break SAMPLE = { "issue": json_data['problem_statement'], "patch": json_data['patch_to_review'], "review": json_data['pos_review'][0] } class Projector(nn.Module): def __init__(self, arch, input_size, hidden_size, use_bf16): super().__init__() depth = int(arch[len("mlp"): arch.index("x_relu")]) layers = [nn.Linear(input_size, hidden_size).bfloat16() if use_bf16 else nn.Linear(input_size, hidden_size)] for _ in range(1, depth): layers.append(nn.ReLU()) layers.append(nn.Linear(hidden_size, 1).bfloat16() if use_bf16 else nn.Linear(hidden_size, 1)) self.model = nn.Sequential(*layers) def forward(self, x): return self.model(x) def resolve_dtype(dtype_name): if dtype_name in {"bf16", "bfloat16"}: return torch.bfloat16 if dtype_name in {"fp16", "float16"}: return torch.float16 return torch.float32 def infer_proj_arch(projector_state_dict): linear_weight_keys = [k for k in projector_state_dict if k.startswith("model.") and k.endswith(".weight")] return f"mlp{len(linear_weight_keys)}x_relu" def process_one(issue_ids, issue_masks, patch_ids, patch_masks, review_ids, review_masks, max_len, min_review_len): review_keep = min(min_review_len, len(review_ids)) remain_for_patch = max(max_len - len(issue_ids) - review_keep, 0) patch_keep = min(len(patch_ids), remain_for_patch) ids_all = issue_ids + patch_ids[:patch_keep] + review_ids[-review_keep:] masks_all = issue_masks + patch_masks[:patch_keep] + review_masks[-review_keep:] if len(ids_all) < max_len: pad_len = max_len - len(ids_all) ids_all = [0] * pad_len + ids_all masks_all = [0] * pad_len + masks_all return ids_all[:max_len], masks_all[:max_len] reward_config = {} reward_config_path = Path(MODEL_DIR) / "reward_config.json" if reward_config_path.exists(): reward_config = json.load(open(reward_config_path, "r", encoding="utf-8")) projector_path = Path(MODEL_DIR) / "projector.pth" projector_state_dict = torch.load(projector_path, map_location="cpu") proj_arch = reward_config.get("proj_arch") or infer_proj_arch(projector_state_dict) torch_dtype = resolve_dtype(reward_config.get("torch_dtype") or "bfloat16") attn_implementation = reward_config.get("attn_implementation") tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR, trust_remote_code=TRUST_REMOTE_CODE, padding_side="left") model_kwargs = {"trust_remote_code": TRUST_REMOTE_CODE, "torch_dtype": torch_dtype} if attn_implementation: model_kwargs["attn_implementation"] = attn_implementation decoder = AutoModelForCausalLM.from_pretrained(MODEL_DIR, **model_kwargs) projector = Projector(proj_arch, decoder.config.hidden_size, decoder.config.hidden_size, torch_dtype == torch.bfloat16) projector.load_state_dict(projector_state_dict) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") decoder.to(device).eval() projector.to(device).eval() issue_inputs = tokenizer(f"{SAMPLE['issue']}", padding=False, truncation="longest_first") patch_inputs = tokenizer(f"{SAMPLE['patch']}", padding=False, truncation="longest_first") review_inputs = tokenizer(SAMPLE["review"], padding=False, truncation="longest_first") input_ids, attention_mask = process_one( issue_inputs["input_ids"], issue_inputs["attention_mask"], patch_inputs["input_ids"], patch_inputs["attention_mask"], review_inputs["input_ids"], review_inputs["attention_mask"], max_len=MAX_SEQ_LEN, min_review_len=MIN_REVIEW_LEN, ) inputs = { "input_ids": torch.tensor([input_ids], dtype=torch.long, device=device), "attention_mask": torch.tensor([attention_mask], dtype=torch.long, device=device), } with torch.no_grad(): hidden_state = decoder(**inputs, output_hidden_states=True).hidden_states[-1] reward = torch.sigmoid(projector(hidden_state).squeeze(-1)[:, -1]).item() print(reward) ``` ## Output The model outputs a single scalar reward score in [0, 1]. Typical interpretation: - higher score: better review quality - lower score: worse review quality This score is best used for: - ranking candidate reviews - pairwise comparison - reward modeling in downstream training or reranking ## Intended Use This model is intended for: - code review quality scoring - reward modeling for review generation - reranking multiple candidate reviews for the same issue and patch ## Limitations - The score is relative, not an absolute guarantee of correctness. - Long-input truncation may affect results. - The model should not be used as the only signal for production-critical review decisions. ## Citation If you use this model, please cite SWE-CARE as appropriate. ``` @misc{guo2025codefusecrbenchcomprehensivenessawarebenchmarkendtoend, title={CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects}, author={Hanyang Guo and Xunjin Zheng and Zihan Liao and Hang Yu and Peng DI and Ziyin Zhang and Hong-Ning Dai}, year={2025}, eprint={2509.14856}, archivePrefix={arXiv}, primaryClass={cs.SE}, url={https://arxiv.org/abs/2509.14856}, } ```